Convert CRLF’s to line feeds on Linux
What’s the best way to convert CRLF‘s to line feeds in files on Linux? I’ve seen sed commands, but is there anything simpler?
Dupe: superuser.com/questions/38744/…. The link provided in the accepted answer covers the dos2unix, perl and vi options among others.
This already has better answers though (so if one of these is to be closed, it should probably be that one)
13 Answers 13
These commands are found in the tofrodos package (on most recent distributions), which also provides the two wrappers unix2dos and dos2unix that mimic the old unix tools of the same name.
This helped me, the other one didn’t. But I really don’t remember what my use case was 5+ years ago. 😛 Also see the original version of the other answer.
dos2unix — DOS/MAC to UNIX text file format converter
dos2unix [options] [-c convmode] [-o file . ] [-n infile outfile . ] Options: [-hkqV] [--help] [--keepdate] [--quiet] [--version]
Consider elaborating on how to get this utility for your Linux system. At least on Ubuntu it’s not installed by default (but by installing tofrodos package you get something very similar: packages.ubuntu.com/jaunty/tofrodos).
Nowadays it looks like unix2dos should be preferred over tofrodos , as tofrodos seems to be abandoned since 2013 and unix2dos is still maintained. Also unix2dos has very detailed man page, which is a plus, and reading of it leaves a feeling of a well-thought tool.
I prefer perl:
perl -lne 's/\r//g; print' winfile.txt > unixfile.txt
But that’s well-suited to my uses, and it’s very easy for me to remember. Not all systems have a dos2unix command, but most that I work on have a perl interpreter.
Another is recode, a powerful replacement for dos2unix and iconv; it’s available in the «recode» package in Debian repositories:
recode ibmpc..lat1 winfile.txt # dos2unix recode lat1..ibmpc unixfile.txt # unix2dos
For awk fans:
awk '< sub("\r$", ""); print >' winfile.txt > unixfile.txt
sed 's/\r$//' winfile.txt > unixfile.txt
And now, only slightly-less-convoluted than deleting the CR’s by hand in a hex editor, straight from one of our stackoverflow.com friends, useable with the beef interpreter (located on your friendly neighborhood Debian repository),
dos2unix in brainfuck!
big thanks to jk for wasting an hour of his life to write this!
(useless use of cat and) perl is as complicated as sed. thus you are not really answering the question but rather collecting reputation 🙂
«best way» is subjective. this works best for me (i’m tons more comfortable with perl than sed). i didn’t promise it would work best for you.
@akira: a question can have multiple valid answers. I use this method as well, occasionally, mostly in combination with other changes, so it is definitely a valid answer; but «use dos2unix» is definitely the more practical answer in most situations. So I think the ratings are fine.
@~quack: that is the point: it is not simpler. thats the same for your perl answer. u2d or fromdos/todos are the right answers because they are simpler than any stuff expressed in any other programming language.
I think you can use tr , as well (though I have no funny format files on which to try):
Preferred: tr should be available on any Linux system, whereas most of the other answers (fromdos, dos2unix etc.) are not necessarily present. So unless you have sudo, they’re less useful. (Also preferred to Voigt and JustJeff’s answer because this one avoids the superfluous cat—not that I have anything against the one sitting on my lap.)
@MikeMaxwell — some people are opposed to the uuoc, but it is rarely actually «useless». And it’s never wrong. Sometimes it’s more performant. I almost always use it for readability — which trumps any couple microseconds that might be saved by not including it 🙂
I don’t disagree with you. It also has an advantage if I’m liable to forget the file I want to read from (assuming I’m typing in from the command line) by the time I write the other part (in this case, by the time I write «tr -d ‘\r'»). And at my age, forgetting is all too common. But that was the only thing I had to choose between this answer and Voight/JustJeff’s, which were otherwise identical.
cat cr_stuffed.file | tr -d \r > no_more_crs.file
nice. i saw another mention of tr earlier today. it’s not a program that gets mentioned very often is it?
I prefer Vim and :set fileformat=unix . While not the fastest, it does give me a preview. It is especially useful in the case of a file with mixed endings.
I found a very easy way… Open file with nano: ## nano file.txt
press Ctrl+O to save, but before pressing Enter press: Alt+D to toggle betwen DOS and Unix/Linux line-endings, or: Alt+M to toggle betwen Mac and Unix/Linux line-endings then press Enter to save and Ctrl+X to quit.
Could you edit your answer to clarify which toggle settings will replicate the behaviour requested by the OP?
The OP wants to toggle off DOS line endings, so Alt+d . Sometimes alt gets intercepted by the terminal program, so you can use esc+d instead.
Lots of nano shortcuts also work with Shift pressed, which often prevents terminal interception, so ‘Alt-Shift-D’ works too.
If you want a GUI method, try the Kate text editor (other advanced text editors may be able to handle this too). Open the find / Replace dialog ( Ctrl + R ), and replace \r\n with \n . (NB: you’ll need to choose «Regular expression» from the drop down and deselect «Selection only» from the options.)
EDIT: Or, if you simply want to convert to Unix format, then use the menu option Tools > End of Line > Unix .
There are text editors, such as jEdit, that can do these transformations automatically — you just tell it if you want Unix, Windows or Mac line separators.
Actually, KATE can do that too through the Tools > End of Line menu. Maybe I should have thought more laterally than answering the question exactly as it was worded — but if you know you specifically want to convert \r\n to \n then using search/replace is easier than remembering which OS uses which line ending. 😉
CR LF to LF using awk:
awk -v RS='\r?\n' 1 command | awk -v RS='\r?\n' 1 awk -v RS='\r?\n' 1 filename
echo -e 'foo\nbar\r\nbaz' | awk -v RS='\r?\n' 1 | hexdump -C
-v RS=’\r?\n’ sets variable RS (input record separator) to \r?\n , meaning input is read line by line separated by LF ( \n ) which may ( ? ) be preceded by CR ( \r ).
1 is the script awk executes. A script consists of condition < action >. In this case, 1 is the condition which evaluates to true. The action is omitted, so the default action is executed, which means print the current line (which could also be written as or simply ).
LF to CR LF : You can set the variable ORS (output record separator) to modify the line ends of the output. Example:
echo -e 'foo\nbar\r\nbaz' | awk -v RS='\r?\n' -v ORS='\r\n' 1 | hexdump -C
How to convert newlines between Unix(LF) and DOS/Windows(CRLF)
The term CRLF refers to Carriage Return (ASCII 13, \r) Line Feed(ASCII 10, \n).
They’re used to note the termination of a line, however, dealt with differently in today’s popular Operation Systems.
For example: in Windows, both a CR and LF are required to note the end of a line, whereas in Linux/UNIX a LF is only required.
In the HTTP protocol, the CR-LF sequence is always used to terminate a line.
sed command
SED command in UNIX stands for stream editor and it can perform lots of functions on file, like searching, find and replace, insertion or deletion.
The most common use is for substitute or find and replace. The commands below replace “cat” with “dog” in the pet.txt :
$ sed 's/cat/dog/' pet.txt $ sed -i 's/cat/dog/' pet.txt # replace the original file, "-i" option means edit files in place.
Here the “s” specifies substitute, there are some useful flags for this operation:
$ sed 's/cat/dog/2' # replace second pattern $ sed 's/cat/dog/g' # apply the replacement to all matches to the regexp, not just the first. $ sed 's/cat/dog/i' # case-insensitive
Replace newline
If you know how to enter the carriage return character in bash( Ctrl-V then Ctrl-M ):
$ sed 's/^M$//g' # CRLF to LF $ sed 's/$/^M/g' # LF to CRLF
Notice that “^M” represents a carriage return character, is not just “^” + character “M”.
$ sed 's/.$//g' # CRLF to LF, assumes that all lines end with CRLF $ sed 's/$/\r/g' # LF to CRLF
Reference
How to bulk convert all the file in a file system branch between Unix and Windows line break format?
Everybody knows 🙂 that in Windows plain text files lines are terminated with CR+LF, and in Unix&Linux — with LF only. How can I quickly convert all my source code files from one format to another and back?
4 Answers 4
That depends: if the files are under version control, this could be a rather unpopular history-polluting decision. Git has the option to automagically convert line endings on check-out.
If you do not care and want to quickly convert, there are programs like fromdos / todos and dos2unix / unix2dos that do this for you. You can use find : find . -type f -name ‘*.php’ -exec dos2unix ‘<>‘ + .
This will run dos2unix on all files, the ones that already have Unix line-breaks (LF), as well as the ones in need to convert the Windows line-breaks (CRLF) to LF. This should be fine, I think. But it’s worth a comment. 🙂
Also, I prefer to start with find . -type f -name ‘*.php’ -exec echo dos2unix ‘<>‘ \; in order to see what files will be affected before going through with the actual conversion. Of course, you might not want to follow my suggestion if the number of files affected is huge.
There are several dedicated programs, including
Simply pick the tool for the appropriate direction and pass the names of the files to convert on the command line.
If you don’t have either, but have Linux or Cygwin:
sed -i -e 's/\r\+$//' filename # dos|unix -> unix sed -i -e 's/\r*$/\r/' filename # dos|unix -> dos
perl -i -pe 's/\r+$//' filename # dos|unix -> unix perl -i -pe 's/\r*$/\r/' filename # dos|unix -> dos
With only POSIX tools (including BusyBox), to go from unix to dos, you’ll need to pass the CR character literally in the sed command.
cr=$(echo | tr '\n' '\r') sed -e "s/$cr*\$/$cr/" filename.dos mv filename.dos filename
In the other direction, you can simply delete all CRs:
tr -d '\r' filename.dos mv filename.dos filename
You can use wildcards to convert many files in the same directory at once, e.g.
To convert all files in the current directory and its subdirectories, if your shell is zsh, you can use **/ , e.g.
You can use **/ in bash ≥4, but you need to run shopt -s globstar first (you can put this line in your ~/.bashrc ). You can use **/ in ksh93, but you need to run set -o globstar first (you can put this line in your ~/.kshrc .
If you can only use the tools that require a redirection, use a for loop.
for x in *.txt; do tr -d '\r' "$x.dos" mv -- "$x.dos" "$x" done
If you don’t have **/ or need more complex matching to select which files to convert, use the find command. Here’s a Linux/Cygwin example which converts all files under the current directory and its subdirectories recursively, except for files called .o and under subdirectories called bin .
find -name 'bin' -type d -prune -o \ \! -name '*.o' \ -exec sed -i -e 's/\r\+$//' <> +
Here’s a POSIX example. We tell find to start a shell that can perform the necessary redirection.
find -name 'bin' -type d -prune -o \ \! -name '*.o' \ -exec sh -c ' tr -d '\r' "$0.dos" mv -- "$0.dos" "$0" ' <> \;
You can make the find method slightly faster, at the expense of more complex code, by using a loop in the shell command.
find -name 'bin' -type d -prune -o \ \! -name '*.o' \ -exec sh -c ' for x; do tr -d '\r' "$x.dos" mv -- "$x.dos" "$x" done ' _ <> +