- Sort each line in a text file
- 6 Answers 6
- How to sort lines of text files in Linux?
- Syntax
- 10 Useful Examples of the Sort Command in Linux
- Sort command in Linux
- Examples of the sort command
- 1. Sort in alphabetical order
- 2. Sort on numerical value [option -n]
- 3. Sort in reverse order [option -r]
- 4. Random sort [option -R]
- 5. Sort by months [option -M]
- 6. Save the sorted results to another file
- 7. Sort Specific Column [option -k]
- 8. Sort and remove duplicates [option -u]
- 9. Ignore case while sorting [option -f]
- 10. Sort by human numeric values [option -h]
Sort each line in a text file
I must sort each line and preserve the order of lines. For example, for the above example the output should be:
coding programming stackoverflow badges question tag
My solution until now is to create a temp file, in which all the lines are sorted. The bash script looks like this:
FILE_TMP=$FILE".tmp" while read line do echo $line | xargs -n1 | sort | xargs >>$FILE_TMP done < $FILE mv $FILE_TMP $FILE
It works fine, but I'm not pleased that I must create a duplicate file, especially because the files are big. So, my question is there any solution to sort in place each line of the file? Thank you,
A quite UNIXish way of doing it is to not create a temporary file but send the output to stdout instead. Then your little script behaves just like "sort" and other utilities and everybody is happy. (And you don't create a temp file if need to do other processing and send the output through a pipe. )
Couldn't you create a temporary 'string' (character array) that will take the contents of a line (ending with line-end character) and then sort them and then replace the current line with the newly sorted line ? The success of this method would depend if u can delete a specific line from within the file ? Example say you are on line#1:- read it into a string>> sort it>> delete line#1 from the file >> add new line#1 to the file >> move to next line and repeat. If this is possible then you can avoid creating a new temp file, if not, then you might have to resort to new temp file.
6 Answers 6
Try this (You may have to change the sed if file is not space separated):
cat datafile.dat | while read line; do echo $line | sed 's/ /\n/g' | sort | gawk ' END ' ; done
If Python were an option, this would be quite easy using the in-place support from the fileinput module
>>> import os >>> import fileinput >>> for line in fileinput.input('file.txt', inplace=1): . line = line.rstrip(os.linesep) . print(' '.join(sorted(line.split()))) .
You could script a text editor (vim or emacs, for example) to do it "in place", but that wouldn't really help you avoid using a temp file since text editors will internally use temp files.
If your real problem is that it is slow to run, that is probably because it is spawning 3 different processes for each line in the source file. You could get around that by using a scripting language like perl that could go through the file sorting lines without spawning any additional processes. You'd still have an additional file for the output.
The accepted answer is somewhat slow. Try this:
Note: Your awk must be GNU, so as to have asort().
I think that the following awk goodness should do the job:
prompt$ cat foo.awk < n = split($0, words) do < change_occured = 0 for (idx = 1; idx words[idx + 1]) < t = words[idx] words[idx] = words[idx + 1] words[idx + 1] = t change_occured = 1 >> > while (change_occured != 0) for (idx in words) < printf("%s ", words[idx]) >split("", array) print "" > prompt$ awk -f foo.awk stackoverflow coding programming heredoc> tag question badges heredoc> EOF coding programming stackoverflow badges question tag
EDIT note that this is not an in place edit. It acts as a filter from stdin to stdout. You can use awk for this as well but reading and writing files there feels "clunky". If you really want to avoid the temporary file, use something like Perl.
Practically any "reasonable" solution for this problem will write the new contents to a new temporary file and then rename. Even things like perl "in place" processing ( perl -pi. ) or text editors actually do that. If you want to do it really in place, writing to the same physical disk position, it could be done (the new contents occupy exactly the same space as the old) but it's rather painful.
You can compile the code from this answer into a overwrite executable, and then run (WARNING: this is dangerous, backup your file first!)
while read line ; do echo $line | xargs -n1 | sort | xargs ; done < f | ./overwrite f
This is rather fragile, for example, you should be absolutely sure that the sorting that does the script does not mess with blank characters (what about DOS newlines? and consecutive blanks?), the script must spit the same amount (or less) of bytes per line as it eats.
How to sort lines of text files in Linux?
To sort lines of text files, we use the sort command in the Linux system.
The sort command is used to prints the lines of its input or concatenation of all files listed in its argument list in sorted order. The operation of sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as the sort key.
Syntax
The general syntax of the sort command is as follows.
sort [OPTION]. [FILE]. sort [OPTION]. --files0-from=F
Brief description of options available in the sort command.
Sr.No. | Option & Description |
---|---|
1 | -b, --ignore-leading-blanks Ignore leading blanks. |
2 | -d, --dictionary-order Consider only blanks and alphanumeric characters. |
3 | -f, --ignore-case Fold lower case to upper case characters. |
4 | -g, --general-numeric-sort Compare according to general numerical value. |
5 | -i, --ignore-nonprinting Consider only printable characters. |
6 | -M, --month-sort Compare (unknown) |
7 | -h, --human-numeric-sord Compare human-readable numbers. |
8 | -n, --numeric-sort Compare according to string numerical value. |
9 | --random-source=FILE Get random bytes from the FILE. |
10 | -r, --reverse Reverse the result of comparisons. |
11 | --sort=WORD Sort according to the WORD. |
12 | --help Display this help and exit |
13 | --version Output version information and exit. |
Here, we will create a file using the cat command and sort this file using the sort command in the Linux system.
$ cat >text.txt Sid Vikash Gaurav ^C $ sort text.txt Gaurav Sid Vikash
Here, we will sort a file in the reverse order using the -r or --reverse option with the sort command in the Linux operating system.
$ cat >text.txt Sid Vikash Gaurav ^C $ sort text.txt Vikash Sid Gaurav
In the above example, we already saw that how can we sort a file but output of the sort command on standard output. Here, we will save output into a new file in the file system.
After executing the above command, a new file will be created with the newtext.txt name.
To check more information and options with descriptions about the sort command, we use the --help option with the sort command as shown below.
To check in which version the sort command is working, we use the --version option with the sort command in the Linux system as shown below.
10 Useful Examples of the Sort Command in Linux
Sort command in Linux is used for sorting the contents of the text files. This tutorial shows you some basic examples of the sort command.
Sort command in Linux
The sort command arranges text lines in useful ways. This simple tool can help you quickly sort information from the command line.
You should note a few thing:
- When you use sort without any options, the default rules are enforced. It helps to understand the default rules to avoid unexpected outcomes.
- When using sort, your original data is safe. The results of your input are displayed on the command line only. However, you can specify output to a separate file if you wish. More on that later.
- Sort was originally designed for use with ASCII characters. I did not test for this, but it is possible that different encodings may produce unexpected results.
The default rules in the sort command
These are the default rules when using sort. The first few examples will clarify how these priorties are managed. Then we will look at specialized options.
Examples of the sort command
Let me show you some examples of sort command that you can use in various situations.
1. Sort in alphabetical order
The default sort command makes it easy to view information in alphabetical order. No options are necessary and even with mixed-case entries, A-Z sorting works as expected.
I am going to use a sample text file named filename.txt and if you view the content of the file, this is what you’ll see:
MX Linux Manjaro Mint elementary Ubuntu
Now if you use sort command on it:
Here’s the alphabetically sorted output:
elementary Manjaro Mint MX Linux Ubuntu
2. Sort on numerical value [option -n]
Let’s take the same list we used for the previous example and sort in numerical order. In case you were wondering, the list reflects the most popular Linux distributions (July, 2019) according to distrowatch.com.
I will modify the contents of the file so that the items are numbered, but out of order as shown below.
1. MX Linux 4. elementary 2. Manjaro 5. Ubuntu 3. Mint
After sorting, the result is:
1. MX Linux 2. Manjaro 3. Mint 4. elementary 5. Ubuntu
Looks good, right? Can you rely on this method to arrange your data accurately, though? Probably not. Let’s look at another example to find out why.
1 5 10 3 5 2 60 23 432 21
Now, if I use the sort command without any options, here’s what I get:
1 10 2 21 23 3 432 5 5 60
NOTE: Numbers are sorted by their leading characters only.
When you add the -n option, the numerical value of the string is now being evaluated rather than only the first character. Now, you can see below that our list is properly sorted.
Now you’ll have the correctly sorted output:
1 2 3 5 5 10 21 23 60 432
3. Sort in reverse order [option -r]
For this one, I am going to use our distro list again. The reverse function is self-explanatory. It will reverse the order of whatever content you have in your file.
And here you have the output text in reverse order:
5. Ubuntu 4. elementary 3. Mint 2. Manjaro 1. MX Linux
4. Random sort [option -R]
If you accidentally mashed your shift key while attempting the reverse function, you might have gotten some strange results. -R rearranges output in randomized order.
Here’s the randomly sorted output:
4. elementary 1. MX Linux 2. Manjaro 5. Ubuntu 3. Mint
5. Sort by months [option -M]
Sort also has built in functionality to arrange by month. It recognizes several formats based on locale-specific information. I tried to demonstrate some unqiue tests to show that it will arrange by date-day, but not year. Month abbreviations display before full-names.
Here is the sample text file in this example:
March Feb February April August July June November October December May September 1 4 3 6 01/05/19 01/10/19 02/06/18
Let’s sort it by months using the -M option:
Here’s the output you’ll see:
01/05/19 01/10/19 02/06/18 1 3 4 6 Jan Feb February March April May June July August September October November December
6. Save the sorted results to another file
As I mentioned earlier, sort does not change the original file by default. If you need to save the sorted content, it can be done.
For this example, I’ve created a new file where I want the sorted information to be printed and saved with the name filename_sorted.txt.
Caution: If you try to direct your sorted data to the same file, it will erase the contents of your file.
sort filename.txt -n > filename_sorted.txt
If you use cat command on the output file, this will be its contents:
1. MX Linux 2. Manjaro 3. Mint 4. elementary 5. Ubuntu
7. Sort Specific Column [option -k]
If you have a table in your file, you can use the -k option to specify which column to sort. I added some arbitrary numbers as a third column and will display the output sorted by each column. I’ve included several examples to show the variety of output possible. Options are added following the column number.
1. MX Linux 100 2. Manjaro 400 3. Mint 300 4. elementary 500 5. Ubuntu 200
This will sort the text on the second column in alphabetical order:
4. elementary 500 2. Manjaro 400 3. Mint 300 1. MX Linux 100 5. Ubuntu 200
This will sort the text by the numerals on the third column.
1. MX Linux 100 5. Ubuntu 200 3. Mint 300 2. Manjaro 400 4. elementary 500
Same as the above command just that the sort order has been reversed.
4. elementary 500 2. Manjaro 400 3. Mint 300 5. Ubuntu 200 1. MX Linux 100
8. Sort and remove duplicates [option -u]
If you have a file with potential duplicates, the -u option will make your life much easier. Remember that sort will not make changes to your original data file. I chose to create a new file with just the items that are duplicates. Below you’ll see the input and then the contents of each file after the command is run.
1. MX Linux 2. Manjaro 3. Mint 4. elementary 5. Ubuntu 1. MX Linux 2. Manjaro 3. Mint 4. elementary 5. Ubuntu 1. MX Linux 2. Manjaro 3. Mint 4. elementary 5. Ubuntu
sort filename.txt -u > filename_duplicates.txt
Here’s the output files sorted and without duplicates.
1. MX Linux 2. Manjaro 3. Mint 4. elementary 5. Ubuntu
9. Ignore case while sorting [option -f]
Many modern distros running sort will implement ignore case by default. If yours does not, adding the -f option will produce the expected results.
Here’s the output where cases are ignored by the sort command:
alpha alPHa Alpha ALpha beta Beta BEta BETA
10. Sort by human numeric values [option -h]
This option allows the comparison of alphanumeric values like 1k (i.e. 1000).
I hope this tutorial helped you get the basic usage of the sort command in Linux. Sort command is often used in conjugation with the uniq command in Linux for uniquely sorting text files.
If you have some cool sort trick, why not share it with us in the comment section?