Thursday, November 27, 2008

EOL character conversions

I need to inspect/process large text files (mostly dictionary data) on a regular basis, and for a quick preview I'd use the trusty less. But one thing nearly always drives me crazy: end of line (EOL) characters.

So what's up with all the EOLs? Read all about them.

Luckily since I'm working on *nix systems anyway, there are nifty little utilities, nay, bash commands that'll do the EOL conversions in a blink.

For conversions between DOS files (CR-LF EOL) and *nix files, use dos2unix and unix2dos. (Or use sed if you're old school, as shown in this Linux FAQ HowTo.)

On the other hand, if you need to convert Mac files to/from *nix files, try out this trick with tr, or awk, or perl.

And finally, if you also need to convert between character encodings/sets, you might want to use the versatile recode utility instead. I use it all the time now since I need to process multilingual dictionaries very often.

No comments: