How to sort the results of find (including nested directories) alphabetically in bash
I have a list of directories based on the results of running the «find» command in bash. As an example, the result of find are the files:
test/a/file test/b/file test/file test/z/file
test/file test/a/file test/b/file test/z/file
4 Answers 4
If you have the GNU version of find, try this:
find test -type f -printf '%h\0%d\0%p\n' | sort -t '\0' -n | awk -F '\0' ''
To use these file names in a loop, do
find test -type f -printf '%h\0%d\0%p\n' | sort -t '\0' -n | awk -F '\0' '' | while read file; do # use $file done
The find command prints three things for each file: (1) its directory, (2) its depth in the directory tree, and (3) its full name. By including the depth in the output we can use sort -n to sort test/file above test/a/file . Finally we use awk to strip out the first two columns since they were only used for sorting.
Using \0 as a separator between the three fields allows us to handle file names with spaces and tabs in them (but not newlines, unfortunately).
$ find test -type f test/b/file test/a/file test/file test/z/file $ find test -type f -printf '%h\0%d\0%p\n' | sort -t '\0' -n | awk -F'\0' '' test/file test/a/file test/b/file test/z/file
If you are unable to modify the find command, then try this convoluted replacement:
find test -type f | while read file; do printf '%s\0%s\0%s\n' "$" "$(tr -dc / '
It does the same thing, with $ being used to get a file’s directory name and the tr command being used to count the number of slashes, which is equivalent to a file’s «depth».
(I sure hope there’s an easier answer out there. What you’re asking doesn’t seem that hard, but I am blanking on a simple solution.)
How to find files in subdirs and sort them by filename in a single command?
which means output is sorted based on filename only, but folder information should be maintained as part of the output. Edit: Make example more complicated as the subdirectory structure may include more than one level.
@camh — if possible I would like to use only unix commands. In any case my question is pretty much a duplicate of yours. Can you transfer the best solution to this thread (keep a link to the original anyways) so I can mark is as the solution?
If @Shawn makes the changes I suggested in my comment (use -printf instead of awk ), I think that is the best solution. I’ve reworked my original implementation to use this method.
3 Answers 3
You need to sort by the last field (considering / as a field separator). Unfortunately, I can’t think of a tool that can do this when the number of fields varies (if only sort -k could take negative values).
To get around this, you’ll have to do a decorate-sort-undecorate. That is, take the filename and put it at the beginning followed by a field separator, then do a sort, then remove the first column and field separator.
find . ! -path "./build*" -name "*.txt" |\ awk -vFS=/ -vOFS=/ '< print $NF,$0 >' |\ sort -n -t / |\ cut -f2- -d/
That awk command says the field separator FS is set to / ; this affects the way it reads fields. The output field separator OFS is also set to / ; this affects the way it prints records. The next statement says print the last column ( NF is the number of fields in the record, so it also happens to be the index of the last field) as well as the whole record ( $0 is the whole record); it will print them with the OFS between them. Then the list is sort ed, treating / as the field separator — since we have the filename first in the record, it will sort by that. Then the cut prints only fields 2 through the end, again treating / as the field separator.
Sorting the output of «find -print0» by piping to the «sort» command
I need to be able to alphabetically sort the output of find before piping it to a command. Entering | sort | between didn’t work, so what could I do?
find folder1 folder2 -name "*.txt" -print0 | xargs -0 myCommand
6 Answers 6
Use find as usual and delimit your lines with NUL. GNU sort can handle these with the -z switch:
find . -print0 | sort -z | xargs -r0 yourcommand
It does not seem to work with find . -name ‘*.dat’ -type f -printf ‘%f\n’ | sort -z | xargs -r0 > output.txt . Is my line wrong due to the printf?
Some versions of sort have a -z option, which allows for null-terminated records.
find folder1 folder2 -name "*.txt" -print0 | sort -z | xargs -r0 myCommand
Additionally, you could also write a high-level script to do it:
find folder1 folder2 -name "*.txt" -print0 | python -c 'import sys; sys.stdout.write("\0".join(sorted(sys.stdin.read().split("\0"))))' | xargs -r0 myCommand
Add the -r option to xargs to make sure that myCommand is called with an argument.
Good one (two?). Interestingly, though, the two methods handle . differently. With sort it winds up at the end of the list. with python it sorts to the top. (maybe python sorts with LC_COLLATE=C )
The problem with all these |sort solutions is that you cannot use -exec any longer. OK, although it is possible to rewrite your statement given to -exec so that it works with xargs , the question is, what about «mini-scripts»? ( sh -c . ) I wouldn’t call that trivial to transform a ‘sh -c’ mini-script with multiple commands so that it can work with xargs (if possible at all, that is)
@syntaxerror: What problem do you have using sh -c with xargs? printf %s\\n a b c d e | xargs -n3 sh -c ‘printf %s, «$@»; printf \\n’ x
I think you need the -n flag for sort#
-n, --numeric-sort compare according to string numerical value
The print0 may have something to do with this, I just tested this. Take the print0 out, you can null terminate the string in sort using the -z flag
Well, that print0 appears to be space-separating the filenames which is what I need to pass to my command, unfortunately
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
find folder1 folder2 -name "*.txt" -print | sort | parallel myCommand
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel chmod 755 parallel cp parallel sem
I don’t understand that last statement. I create a file with a line break in the file name and execute your command: cd /tmp && touch $’a\nz’ && ls && find -maxdepth 1 -print | sort | parallel echo . Total false output. I know GNU Parallel now, but that answer misses the original question, doesn’t it?
I know that it is bad practice to use crazy characters in file names — I am already including the blank space. I just see that parallel has a -0 parameter. Nice. No downvote. find -maxdepth 1 -print0 | sort -z | parallel -0 echo .
@uav In my 25 years of sysadmin I have never seen a user making a file with \n. I have seen plenty of files with ‘ space and «. So unless you have evil users or a filesystem with error, I will reckon you will not meet a file with \n that was not made by a fellow sysadm.
Some implementation of find supports ordered traversal directly via the -s parameter:
-s Cause find to traverse the file hierarchies in lexicographical order, i.e., alphabetical order within each directory. Note: `find -s' and `find | sort' may give different results.
Some solutions here don’t work correctly because the sort command takes the full «path» string to sorting instead of the filename string.
This is a quite complicated but working example of natural sorting results of the «find» command:
find every_minute -type f -name "*.sh" -printf '%f\t%p\n' | sort -V -k1 | cut -d$'\t' -f2 | tr '\n' '\0' | xargs -r0 -I <> echo 'Found: "<>"'
Found: "every_minute/api/1_build_synonyms.sh" Found: "every_minute/search_module/2_rotate_index.sh" Found: "every_minute/api/3_check_synonyms.sh" Found: "every_minute/api/4_run_schedule.sh" Found: "every_minute/search_module/10_test.sh"
Example of an invalid find every_minute -type f -name «*.sh» | sort -z | xargs -r0 echo command result:
every_minute/api/1_build_synonyms.sh every_minute/api/3_check_synonyms.sh every_minute/api/4_run_schedule.sh every_minute/search_module/10_test.sh every_minute/search_module/2_rotate_index.sh