- Network Security. ………
- February 2, 2016
- Unix/Linux: Sort command to sort IP Address
- 5 comments:
- bash Cookbook by Carl Albing, JP Vossen, Cameron Newham
- Sorting IP Addresses
- Problem
- Solution
- Discussion
- The short answer
- A better short answer
- The long answer
- Similar operations
- Sort uniq IP address in from Apache log
- 6 Answers 6
Network Security. ………
Network Security Welcomes You.
A place where you can find many things related to Computer Hardware, Software and Networking, Cyber Security, Tips and Tricks, etc.
February 2, 2016
Unix/Linux: Sort command to sort IP Address
Sometime while dealing with list of IP Address, we might need to sort IP Address in order. «sort» command can be use for sorting such list. But one must know how to properly use sort command to sort IP Addresses because of its dotted-quad notations.
While using sort command without options, it sorts list of IP Address based upon alphabetic order.
Though using sort -n will sort list of IP Address in numerical order but still it has limitation because of dotted-quad notation of IP Address. Therefore the correct way to sort IP Address is to order the list numerically by considering IP Address as set of four fields having numeric value separated by dot.
$ cat ip.txt
9.1.4.4
9.1.4.4
9.1.78.4
149.4.78.4
149.4.78.41
14.4.78.41
10.4.7.41
$ sort -n -t . -k 1,1 -k 2,2 -k 3,3 -k 4,4 ip.txt
9.1.4.4
9.1.4.4
9.1.78.4
10.4.7.41
14.4.78.41
149.4.78.4
149.4.78.41
- -t : Set field to . (dot)
- -n : sort list numerically
- -k options: Sort via a key using start and stop position
5 comments:
Thank you so much as you have been willing to share information with us. We will forever admire all you have done here because you have made my work as easy as ABC. https://192-168-i-i.com/
Reply Delete
Pretty good post. I have just stumbled upon your blog and enjoyed reading your blog posts very much. I am looking for new posts to get more precious info. Big thanks for the useful info. https://192-168-i-i.com/
Reply Delete
There is a norm of correspondence which is called an Internet Protocol standard (IP). A straightforward regular similarity would be your road address. With the goal for you to get snail mail at home the sending party must have your right postage information (IP address) in your city (system) or you don’t get your mail.192.168.l0.1
Reply Delete
bash Cookbook by Carl Albing, JP Vossen, Cameron Newham
Get full access to bash Cookbook and 60K+ other titles, with a free 10-day trial of O’Reilly.
There are also live events, courses curated by job role, and more.
Sorting IP Addresses
Problem
You want to sort a list of numeric IP address, but you’d like to sort by the last portion of the number or by the entire address logically.
Solution
To sort by the last octet only (old syntax):
$ sort -t. -n +3.0 ipaddr.list 10.0.0.2 192.168.0.2 192.168.0.4 10.0.0.5 192.168.0.12 10.0.0.20 $
To sort the entire address as you would expect (POSIX syntax):
$ sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n ipaddr.list 10.0.0.2 10.0.0.5 10.0.0.20 192.168.0.2 192.168.0.4 192.168.0.12 $
Discussion
We know this is numeric data, so we use the -n option. The -t option indicates the character to use as a separator between fields (in our case, a period) so that we can also specify which fields to sort first. In the first example, we start sorting with the third field (zero-based) from the left, and the very first character (again, zero-based) of that field, so +3.0 .
In the second example, we used the new POSIX specification instead of the traditional (but obsolete) +pos1 -pos2 method. Unlike the older method, it is not zero-based, so fields start at 1.
$ sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n ipaddr.list
Wow, that’s ugly. Here it is in the old format: sort -t. +0n -1 +1n -2 +2n -3 +3n -4 , which is just as bad.
Using -t. to define the field delimiter is the same, but the sort-key fields are given quite differently. In this case, -k 1,1n means “start sorting at the beginning of field one (1) and (,) stop sorting at the end of field one (1) and do a numerical sort (n) . Once you get .
Get bash Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.
The short answer
Here’s the invocation that works. It’s explained in the long answer that follows.
sort -n -t . -k 1,1 -k 2,2 -k 3,3 -k 4,4
A better short answer
A sharp-eyed reader notes that the sort binary included with GNU coreutils version 7.0 and higher accepts a new option: –version-sort (-V, –sort=version). Since IPv4 addresses look like software version numbers, the new option will reliably sort them:
Since not all field-based sorting is that simple, I’ll continue with my original explanation – but for sorting IPv4 addresses the -V option is simplest.
The long answer
More than once I’ve been confronted with a list of IP addresses that I’ve wanted to sort into numeric order. Trouble is, the dotted-quad notation isn’t sort-friendly. Consider the following raw list of addresses.
$ cat addresses.txt 129.95.30.40 5.24.69.2 19.20.203.5 1.2.3.4 19.20.21.22 5.220.100.50
Without options, sort will rely on alphabetic order, which certainly won’t do what you want:
$ sort addresses.txt 1.2.3.4 129.95.30.40 19.20.203.5 19.20.21.22 5.220.100.50 5.24.69.2
There are so many mistakes in this ordering I’m not even going to try to list them all.
The situation is only marginally improved when using the —numeric-sort ( -n ) option.
$ sort -n addresses.txt 1.2.3.4 5.220.100.50 5.24.69.2 19.20.203.5 19.20.21.22 129.95.30.40
The first set of numbers in each dotted-quad sort correctly—5 preceeds 19, and 129 is at the tail end—but the internal numbering still gets improper treatment. 5.220.100.50 is listed prior to 5.24.69.2 because 220 is alphabetically prior to 24. Likewise the two 19.20.x.x addresses are mixed up because 203 is alphabetically prior to 21.
The solution is to tell sort to order the list numerically, considering each address as a set of four numeric fields, each separated by a dot.
$ sort -n -t . -k 1,1 -k 2,2 -k 3,3 -k 4,4 addresses.txt 1.2.3.4 5.24.69.2 5.220.100.50 19.20.21.22 19.20.203.5 129.95.30.40
In English, you’re saying, Yo, sort ! I’ve got here a list of numbers ( -n ), but each item in the list consists of some subnumbers, fields set apart from the others by a dot ( -t . ). Sort first by the first field, and only the first field ( -k 1,1 ), then by the second and only the second ( -k 2,2 ), and so on ( -k 3,3 -k 4,4 ).
Or, as I mentioned above, just use sort -V .
Similar operations
There are other widely used data with mixed numeric and alphabetic fields can be sorted with similar techniques.
The getent utility will sort group information by GID, but the sorting is done per-source. If you have an extensive /etc/group file and a large network-provided group database (from, e.g., LDAP), the groups are not interleaved. Here sort can do its magic:
getent group | sort -n -t: -k 3,3
When returning the passwd database, getent sorts by username not UID, making sort even more useful:
getent passwd | sort -n -t: -k 3,3
Sort uniq IP address in from Apache log
I’m trying to extract IP addresses from my apache log, count them, and sort them. And for whatever reason, the sorting part is horrible. Here is the command:
cat access.* | awk '< print $1 >' | sort | uniq -c | sort -n
16789 65.X.X.X 19448 65.X.X.X 1995 138.X.X.X 2407 213.X.X.X 2728 213.X.X.X 5478 188.X.X.X 6496 176.X.X.X 11332 130.X.X.X
I don’t understand why these values aren’t really sorted. I’ve also tried to remove blanks at the start of the line ( sed ‘s/^[\t ]*//g’ ) and using sort -n -t» » -k1 , which doesn’t change anything. Any hint ?
6 Answers 6
This may be late, but using the numeric in the first sort will give you the desired result,
cat access.log | awk '' | sort -n | uniq -c | sort -nr | head -20
29877 93.xxx.xxx.xxx 17538 80.xxx.xxx.xxx 5895 198.xxx.xxx.xxx 3042 37.xxx.xxx.xxx 2956 208.xxx.xxx.xxx 2613 94.xxx.xxx.xxx 2572 89.xxx.xxx.xxx 2268 94.xxx.xxx.xxx 1896 89.xxx.xxx.xxx 1584 46.xxx.xxx.xxx 1402 208.xxx.xxx.xxx 1273 93.xxx.xxx.xxx 1054 208.xxx.xxx.xxx 860 162.xxx.xxx.xxx 830 208.xxx.xxx.xxx 606 162.xxx.xxx.xxx 545 94.xxx.xxx.xxx 480 37.xxx.xxx.xxx 446 162.xxx.xxx.xxx 398 162.xxx.xxx.xxx
I had totally forgotten this question but I managed to find a solution. This didn’t work (see in my question). But adding non numeric character between the number and the IP solved my issue.
Why use cat | awk ? You only need to use awk :
awk '< print $1 >' /var/log/*access*log | sort -n | uniq -c | sort -nr | head -20
I don’t know why a simple sort -n didn’t work, but adding a non numeric character between the counter and the IP soved my issue.
cat access.* | awk ' < print $1 >' | sort | uniq -c | sed -r 's/^[ \t]*(4+) (.*)$/\1 --- \2/' | sort -rn
Sometimes it is worth to exclude some statuses and bots cat access.log |grep -v -w 200 | grep -v -w 403 | grep -v -e ‘.jpg’|grep -v -i bot | awk ‘
please see my answer. On my side I had same issue and found out it was related to LOCALE and sort command
cat access.* | awk '< print $1 >' | sort | awk '' | sort -n
Control characters in the files?
File system full (temp files)?
If sort isn’t resulting as expected it’s probably due to a locale issue.
If anyone wants there here goes PHP function that can count which ip how many times appears in file.
function get_access_ip_count($input_file_name, $output_file_name)< $access_ip_array = array(); $overall_count = 0; $handle = fopen($input_file_name, "r"); if ($handle) < while (($line = fgets($handle)) !== false) < preg_match('/\d\.\d\.\d\.\d/', $line, $matches); #print_r($matches); #exit; if($matches[0]>0) < #print_r($matches); $ip = $matches[0]; #echo "ip: $ip"; if(!isset($access_ip_array[$ip]))< $access_ip_array[$ip] = 1; $overall_count++; >else < $access_ip_array[$ip]++; $overall_count++; >> > fclose($handle); uasort($access_ip_array,"Descending"); echo ""; print_r($access_ip_array); echo "
"; $output_file = fopen($output_file_name, "w"); fwrite($output_file, print_r($access_ip_array, TRUE)); fclose($output_file); echo "overall_count: $overall_count"; > else < echo "Couldn't open file"; >> function Descending($a, $b) < if ($a == $b) < return 0; >return ($a > $b) ? -1 : 1; >