Wednesday, May 16, 2007

Bash commands and scripts

I love Bash shell commands and scripts. Really, I do. Where would I be without all them one-liners to *bang* *zip* process those huge dictionary text files in less than 5 seconds?

Except that sometimes I need 30 minutes to look for an obscure option and how to use it exactly to get the result I want.

For instance, sort is cool. It sorts lines so fast no spreadsheet application is worth a second, nay, even a first thought. Heck, it even ferrets out unique lines calmly!

Then I have this file, which is all fields of numbers, e.g.
1   1    0    0    0
1 2 2 0 2
...

And I needed it to be sorted by the first field, then the second. Running plain sort on it gave me
1     1     0     0    0
1 10 0 10 10
1 11 ...
...
1 1921 ...
1 2 ...

OK, so it's not seeing the numbers as numbers, it's seeing them as ASCII strings. A man page look-up later, I tried sort -n. Then sort -k, then with both options. To no avail. Time to Google.

30 minutes of surfing later, I finally found this nugget of an example that did the trick:
sort -k 1n,1 -k 2n,2

See, you have to state explicitly (via -k) that the first key starts and ends at field 1, and to treat it as a number, and repeat it with field 2 (or any other fields you'd like to use as the sort key) as well.

I love Bash commands.

No comments: