Line separator in linux

Содержание

Difference between CR LF, LF and CR line break types
10 Answers 10
Why does Linux use LF as the newline character?
4 Answers 4
Есть ли стандарт для line separator?
Войдите, чтобы написать ответ
Почему VM linux (VMware) на моем ноутбуке подключённом по WiFi имеет только eth0 подключение?

Difference between CR LF, LF and CR line break types

I’d like to know the difference (with examples if possible) between CR LF (Windows), LF (Unix) and CR (Macintosh) line break types.

Very similar, but not an exact duplicate. \n is typically represented by a linefeed, but it’s not necessarily a linefeed.

CR and LF are ASCII and Unicode control characters while \r and \n are abstractions used in certain programming languages. Closing this question glosses over fundamental differences between the questions and perpetuates misinformation.

@AdrianMcCarthy It’s a problem with the way close votes act as answers in a way; an answer claiming the two were the same could be downvoted and then greyed out as very, very wrong, but it only takes 4 agreeing votes (comparable to upvotes) to have a very wrong close happen, with no way to counter the vote until after it’s happened.

This formulation of the question is admittedly better, but it is still for all practical purposes the same question.

10 Answers 10

CR and LF are control characters, respectively coded 0x0D (13 decimal) and 0x0A (10 decimal).

They are used to mark a line break in a text file. As you indicated, Windows uses two characters the CR LF sequence; Unix (and macOS starting with Mac OS X 10.0) only uses LF; and the classic Mac OS (before 10.0) used CR.

An apocryphal historical perspective:

As indicated by Peter, CR = Carriage Return and LF = Line Feed, two expressions have their roots in the old typewriters / TTY. LF moved the paper up (but kept the horizontal position identical) and CR brought back the «carriage» so that the next character typed would be at the leftmost position on the paper (but on the same line). CR+LF was doing both, i.e., preparing to type a new line. As time went by the physical semantics of the codes were not applicable, and as memory and floppy disk space were at a premium, some OS designers decided to only use one of the characters, they just didn’t communicate very well with one another 😉

Most modern text editors and text-oriented applications offer options/settings, etc. that allow the automatic detection of the file’s end-of-line convention and to display it accordingly.

so actually Windows is the only OS that uses these characters properly, Carriage Return, followed by a Line Feed.

Would it be accurate, then, to say that a text file created on Windows is the most compatible of the three i.e. the most likely to display on all three OS subsets?

@Hashim it might display properly but trying to run a textual shell script with carriage returns will usually result in an error

Rolf — that statement assumes that keeping old terminology/technology in new technology is correct. CRLF = 2 bytes. CR = 1, LF = 1. With as often as they are used, that actually translates to a huge amount of data. Once again, Windows has chosen to be different from the entirety of the *NIX world.

This is a good summary I found:

The Carriage Return (CR) character ( 0x0D , \r ) moves the cursor to the beginning of the line without advancing to the next line. This character is used as a new line character in Commodore and early Macintosh operating systems (Mac OS 9 and earlier).

The Line Feed (LF) character ( 0x0A , \n ) moves the cursor down to the next line without returning to the beginning of the line. This character is used as a new line character in Unix-based systems (Linux, Mac OS X, etc.)

The End of Line (EOL) sequence ( 0x0D 0x0A , \r\n ) is actually two ASCII characters, a combination of the CR and LF characters. It moves the cursor both down to the next line and to the beginning of that line. This character is used as a new line character in most other non-Unix operating systems including Microsoft Windows, Symbian and others.

The «vertical tab»-character moves the cursor down and keep the position in the line, not the LF-character. The LF is EOL.

@Vicrobot Developers will often split a string or perform other operations with the exact sequence \r\n , so \n\r would not match. Also one would think that text editors also treat the two characters as one sequence and don’t separatly go «oh, now I have to go down one line» and «oh, now I have to move to the front». Were it so then yes, you could freely swap the order around

It’s really just about which bytes are stored in a file. CR is a bytecode for carriage return (from the days of typewriters) and LF similarly, for line feed. It just refers to the bytes that are placed as end-of-line markers.

There is way more information, as always, on Wikipedia.

I think it’s also useful to mention that CR is the escape character \r and LF is the escape character \n . In addition, Wikipedia:Newline.

In Simple words CR and LF is just end of line and new line according to this link , is this correct ?

@shaijut CR stands for Carriage Return. That was what returned the carriage on typewriters. So, mostly correct.

The superior LFCR option is sadly missing. Its benefit is that by doing the Line Feed first, the Selectric golfball can’t smear the just printed line with still fresh ink upon executing the Carriage Return

Actually, it’s not a typewriter but «teletype», old computer client terminals with mechanical print heads and paper, where CR/LF were required for computers to behave properly. If you just did CR, you would have a bunch of characters on top of each other on a paper. If you just did LF, your text lines would slowly migrate to the right on the paper. CR/LF were required for proper teletype based computing. An old Star Trek game would dump «FIRING» diagonally down the page.

Carriage Return (Mac pre-OS X)

Line Feed (Linux, Mac OS X)

Carriage Return and Line Feed (Windows)

If you see ASCII code in a strange format, they are merely the number 13 and 10 in a different radix/base, usually base 8 (octal) or base 16 (hexadecimal).

The \r and \n only works in some programming languages, although it seems to be universal among programming languages that use backslash to indicate special characters.

Jeff Atwood has a blog post about this: The Great Newline Schism

The sequence CR+LF was in common use on many early computer systems that had adopted teletype machines, typically an ASR33, as a console device, because this sequence was required to position those printers at the start of a new line. On these systems, text was often routinely composed to be compatible with these printers, since the concept of device drivers hiding such hardware details from the application was not yet well developed; applications had to talk directly to the teletype machine and follow its conventions. The separation of the two functions concealed the fact that the print head could not return from the far right to the beginning of the next line in one-character time. That is why the sequence was always sent with the CR first. In fact, it was often necessary to send extra characters (extraneous CRs or NULs, which are ignored) to give the print head time to move to the left margin. Even after teletypes were replaced by computer terminals with higher baud rates, many operating systems still supported automatic sending of these fill characters, for compatibility with cheaper terminals that required multiple character times to scroll the display.

Источник

Why does Linux use LF as the newline character?

As far as I know, every operating system has a different way to mark the end of line (EOL) character. Commercial operating systems use carriage return for EOL (carriage return and line feed on Windows, carriage return only on Mac). Linux, on the other hand, just uses line feed for EOL. Why doesn’t Linux use carriage return for EOL (and solely line feed instead)?

Explained on Wikipedia. Basically Multics in the last 60s (which inspired Unix, which inspired Linux) added some level of abstraction to avoid having the text encoding being encumbered by limitations of teletype devices so it didn’t have to encode newline on two characters (which makes even less sense 50 years later of course).

The second paragraph is a valid question, but the first paragraph is so full of oversimplifications and outright errors that it is drowning it out, with answerers having to correct a whole bunch of iffy and faulty premises before they even get to the question.

What? Linux is a free approximation of a commercial OS standard called UNIX. UNIX-compliant systems cost a lot of money back then and they still do today.

4 Answers 4

Windows uses CR LF because it inherited it from MS-DOS.

MS-DOS uses CR LF because it was inspired by CP/M which was already using CR LF .

CP/M and many operating systems from the eighties and earlier used CR LF because it was the way to end a line printed on a teletype (return to the beginning of the line and jump to the next line, just like regular typewriters). This simplified printing a file because there was less or no pre-processing required. There was also mechanical requirements that prevented a single character to be usable. Some time might be required to allow the carriage to return and the platen to rotate.

Gnu/Linux uses LF because it is a Unix clone. 1

Unix used a single character, LF , from the beginning to save space and standardize to a canonical end-of-line, using two characters was inefficient and ambiguous. This choice was inherited from Multics which used it as early as 1964. Memory, storage, CPU power and bandwidth were very sparse so saving one byte per line was worth doing. When a file was printed, the driver was converting the line feed (new-line) to the control characters required by the target device.

LF was preferred to CR because the latter still had a specific usage. By repositioning the printed character to the beginning of the same line, it allowed to overstrike already typed characters.

Apple initially decided to also use a single character but for some reason picked the other one: CR . When it switched to a BSD interface, it moved to LF .

These choices have nothing to do with the fact an OS is commercial or not.

1 This is the answer to your question.

Источник

Есть ли стандарт для line separator?

Какой стандарт line separator используете у себя? Есть ли какой-то общепризнанный?

Обнаружил у себя просто в проекте часть файлов, где перенос строки в стиле Windows (CRLF) — ‘\n\r’, а другая часть в стиле Linux/Mac (LF) — ‘\n’. Возникло желание привести все к единому стандарту, разумеется сразу предпочтение к LF, но возможно у кого-то есть другое мнение? Вообще, стоит ли обращать внимание на это?

Если у вас имеется Git-репозиторий, то можно выполнить:

git config —global core.autocrlf input
Благодаря этому параметру код, полученный из репозиториев, на всех системах будет точно с таким же переносом строк, как и в репозитории. Однако, перед записью в репозиторий окончания строк будут конвертированы в LF. Эта настройка удобна для кросс-платформенной разработки в Unix-like системах, но так же может использоваться и в Windows. А если в Windows вам срочно потребуется CRLF, просто измените параметр input на true, таким образом при выгрузке кода из репозитория на машине пользователя будут CRLF-переносы, тогда как в самом репозитории так и останутся LF-переносы строк.

Сейчас большая часть софта умеет обрабатывать оба варианта. Если это вам не мешает я бы забил. Если мешает есть широкий спектр средств для решения этой проблемы. От простого пакетного преобразования, до пост-чекаут и пре-чекин обработок, которые будут доставать файлы из репозитория с одним окончанием строк (допустим Windows если вы работаете на Windows), а класть в репозиторий с Linux-style.

Войдите, чтобы написать ответ

Почему VM linux (VMware) на моем ноутбуке подключённом по WiFi имеет только eth0 подключение?

Источник