How to count number of files in each directory?
Are you looking for a way to count the number of files in each of the sub-directories directly under ./ ?
How’s this an off-topic question?? I would like to see close-voters comments with reason! If this is off-topic then where does this belong to? super user? I don’t think so..
voted to reopen it. There may be other answers that could be useful in many situations (including script programming, which is the reason I reached this question).
21 Answers 21
This prints the file count per directory for the current directory level:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
By far the best (and most elegant) solution if one wants to list the number of files in top level directories recursively.
This has two problems: It counts one file per directory more than there actually is and it gives a useless line containing the size of the current directory as «1 size«. Both can be fixed with du -a | sed ‘/.*\.\/.*\/.*/!d’ | cut -d/ -f2 | sort | uniq -c . Add | sort -nr to sort by the count instead of the directory name.
I’d like to point out that this works in OSX, too. (Just copy-pasting Linux advice into an OSX shell usually doesn’t work.)
it fetches unneeded size by du -a . Better way is using find command. but main idea is exactly the same 🙂
Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do files=("$dir"/*) printf "%5d files in directory %s\n" "$" "$dir" done
Its just a slighly different version from the above, so: ( hint: its sorted by name and its in csv) for x in find . -maxdepth 1 -type d | sort ; do y= find $x | wc -l ; echo $x,$y; done
Great one! Putting it into a single line (so it’s confortable for direct usage in shell): find . -type d -print0 | while read -d » -r dir; do files=(«$dir»/*); printf «%5d files in directory %s\n» «$» «$dir»; done#files[@]>
I needed to get the number of all files (recursively count) in each subdirectory. This modification gives you that: find . -maxdepth 1 -type d -print0 | while read -d » -r dir; do num=$(find $dir -ls | wc -l); printf «%5d files in directory %s\n» «$num» «$dir»; done
@Kory The following will do it: find . -maxdepth 1 -type d -print0 | while read -d » -r dir; do num=$(find «$dir» -ls | wc -l); printf «%5d files in directory %s\n» «$num» «$dir»; done | sort -rn -k1
@OmidS Great oneliner, but $dir should be inside quotes in your first comment to correctly handle dir names with whitespaces. : find . -maxdepth 1 -type d -print0 | while read -d » -r dir; do num=$(find «$dir» -ls | wc -l); printf «%5d files in directory %s\n» «$num» «$dir»; done
find . -type f | cut -d/ -f2 | sort | uniq -c
- find . -type f to find all items of the type file , in current folder and subfolders
- cut -d/ -f2 to cut out their specific folder
- sort to sort the list of foldernames
- uniq -c to return the number of times each foldername has been counted
Perfect. And can be extended to count over subdirectories by replacing the field specifiers with a list of field specifiers. E.g,: find . -type f | cut -d/ -f2,3 | sort | uniq -c
You could arrange to find all the files, remove the file names, leaving you a line containing just the directory name for each file, and then count the number of times each directory appears:
find . -type f | sed 's%/[^/]*$%%' | sort | uniq -c
The only gotcha in this is if you have any file names or directory names containing a newline character, which is fairly unlikely. If you really have to worry about newlines in file names or directory names, I suggest you find them, and fix them so they don’t contain newlines (and quietly persuade the guilty party of the error of their ways).
If you’re interested in the count of the files in each sub-directory of the current directory, counting any files in any sub-directories along with the files in the immediate sub-directory, then I’d adapt the sed command to print only the top-level directory:
find . -type f | sed -e 's%^\(\./[^/]*/\).*$%\1%' -e 's%^\.\/[^/]*$%./%' | sort | uniq -c
The first pattern captures the start of the name, the dot, the slash, the name up to the next slash and the slash, and replaces the line with just the first part, so:
The second replace captures the files directly in the current directory; they don’t have a slash at the end, and those are replace by ./ . The sort and count then works on just the number of names.
What is the best way to count «find» results?
as a simple portable solution? Your original solution is spawning a new process printf for every individual file found, and that’s very expensive (as you’ve just found).
Note that this will overcount if you have filenames with newlines embedded, but if you have that then I suspect your problems run a little deeper.
I don;t think that warrants a downvote given that the filename/newline limitation is pretty rare and noted above. Slower ? Perhaps. Given you’re querying a filesystem I suspect the speed difference is small. Across my 10,000 files I measure 3ms difference
The performance difference between ‘find
Try this instead (require find ‘s -printf support):
find -type f -printf '.' | wc -c
It will be more reliable and faster than counting the lines.
Note that I use the find ‘s printf , not an external command.
$ time find -type f -printf '.' | wc -c 8 real 0m0.004s user 0m0.000s sys 0m0.007s
$ time find -type f | wc -l 8 real 0m0.006s user 0m0.003s sys 0m0.000s
So my solution is faster =) (the important part is the real line)
With such a small benchmark, the timings are probably dominated by other factors than the thing you want to measure. An experiment with a big tree would be more useful. But this gets my vote for actually doing what the OP asked for.
POSIX compliant and newline-proof:
find /path -exec printf %c <> + | wc -c
And, from my tests in / , not even two times slower than the other solutions, which are either not newline-proof or not portable.
Note the + instead of \; . That is crucial for performance, as \; spawns one printf command per file name, whereas + gives as much file names as it can to a single printf command. (And in the possible case where there are too many arguments, Find intelligently spawns new Printfs on demand to cope with it, so it would be as if
Find the number of files in a directory
Is there any method in Linux to calculate the number of files in a directory (that is, immediate children) in O(1) (independently of the number of files) without having to list the directory first? If not O(1), is there a reasonably efficient way? I’m searching for an alternative to ls | wc -l .
ls | wc -l will cause ls to do an opendir(), readdir() and probably a stat() on all the files. This will generally be at least O(n).
Yeah correct, my fault. I was thinking of O(1) and O(n) to be same, although I should know it better.
8 Answers 8
readdir is not as expensive as you may think. The knack is avoid stat’ing each file, and (optionally) sorting the output of ls.
avoids aliases in your shell, doesn’t sort the output, and lists 1 file-per-line (not strictly necessary when piping the output into wc).
The original question can be rephrased as «does the data structure of a directory store a count of the number of entries?», to which the answer is no. There isn’t a more efficient way of counting files than readdir(2)/getdents(2).
One can get the number of subdirectories of a given directory without traversing the whole list by stat’ing (stat(1) or stat(2)) the given directory and observing the number of links to that directory. A given directory with N child directories will have a link count of N+2, one link for the «..» entry of each subdirectory, plus two for the «.» and «..» entries of the given directory.
However one cannot get the number of all files (whether regular files or subdirectories) without traversing the whole list — that is correct.
The «/bin/ls -1U» command will not get all entries however. It will get only those directory entries that do not start with the dot (.) character. For example, it would not count the «.profile» file found in many login $HOME directories.
One can use either the «/bin/ls -f» command or the «/bin/ls -Ua» command to avoid the sort and get all entries.
Perhaps unfortunately for your purposes, either the «/bin/ls -f» command or the «/bin/ls -Ua» command will also count the «.» and «..» entries that are in each directory. You will have to subtract 2 from the count to avoid counting these two entries, such as in the following:
expr `/bin/ls -f | wc -l` - 2 # Those are back ticks, not single quotes.
The —format=single-column (-1) option is not necessary on the «/bin/ls -Ua» command when piping the «ls» output, as in to «wc» in this case. The «ls» command will automatically write its output in a single column if the output is not a terminal.
Find count of files matching a pattern in a directory in linux
I am new to linux. I have a directory in linux with approx 250,000 files I need to find count of number of files matching a pattern. I tried using following command :
ls -1 20061101-20131101_kh5x7tte9n_2010_* | wc -l
-bash: /bin/ls: Argument list too long 0
7 Answers 7
It might be better to use find for this:
find . -name "pattern_*" -printf '.' | wc -m
find . -maxdepth 1 -name "20061101-20131101_kh5x7tte9n_2010_*" -printf '.' | wc -m
find will return a list of files matching the criteria. -maxdepth 1 will make the search to be done just in the path, no subdirectories (thanks Petesh!). -printf ‘.’ will print a dot for every match, so that names with new lines won’t make wc -m break.
Then wc -m will indicate the number of characters which will match the number of files.
Performance comparation of two possible options:
Let’s create 10 000 files with this pattern:
$ for i in ; do touch 20061101-20131101_kh5x7tte9n_201_$i; done
And then compare the time it takes to get the result with ls -1 . or find . :
$ time find . -maxdepth 1 -name "20061101-20131101_kh5x7tte9n_201_*" | wc -m 10000 real 0m0.034s user 0m0.017s sys 0m0.021s $ time ls -1 | grep 20061101-20131101_kh5x7tte9n_201 | wc -m 10000 real 0m0.254s user 0m0.245s sys 0m0.020s
find is x5 times faster! But if we use ls -1f (thanks Petesh again!), then ls is even faster than find :
$ time ls -1f | grep 20061101-20131101_kh5x7tte9n_201 | wc -m 10000 real 0m0.023s user 0m0.020s sys 0m0.012s