Hoylen's Weblog

Thu, 21 Jan 2010

Changing file line endings and encodings in emacs

Text files on Unix systems use a single line feed character (LF, 0x0A) to indicate the end of a line. Text files on MS-DOS and Microsoft Windows uses a carrage return plus line feed pair (CR-LF, 0x0D 0x0A). The classical Macintosh used a single carriage return character (CR, Ox0D). Thankfully, the LF-CR pair has never been used!

One way to change the line ending convention is to use emacs with the set-buffer-file-coding-system function (mapped to C-x RET f). When it prompts you for the coding system, enter either "unix", "dos" or "mac".

This is easier than trying to remember cryptic commands like:

tr -d '\r'
sed 's/$/^M/'

And having to worry about getting them to work because of different variations in sed and shell environments (e.g. when using bash the ^M is typed using Ctrl-v Ctrl-m).

If your system has the unix2dos and dos2unix commands installed (e.g. Cygwin and most Linux distributions do) use them. Otherwise, emacs lives up to its reputation as the kitchen sink tool.