Maximum number of files/directories on Linux?
My idea was to make a directory for each item, based on its unique ID, but then I’d still have 20000 directories in a main uploads directory, and it will grow indefinitely as old items won’t be removed.
6 Answers 6
ext[234] filesystems have a fixed maximum number of inodes; every file or directory requires one inode. You can see the current count and limits with df -i . For example, on a 15GB ext3 filesystem, created with the default settings:
Filesystem Inodes IUsed IFree IUse% Mounted on /dev/xvda 1933312 134815 1798497 7% /
There’s no limit on directories in particular beyond this; keep in mind that every file or directory requires at least one filesystem block (typically 4KB), though, even if it’s a directory with only a single item in it.
As you can see, though, 80,000 inodes is unlikely to be a problem. And with the dir_index option (enablable with tune2fs ), lookups in large directories aren’t too much of a big deal. However, note that many administrative tools (such as ls or rm ) can have a hard time dealing with directories with too many files in them. As such, it’s recommended to split your files up so that you don’t have more than a few hundred to a thousand items in any given directory. An easy way to do this is to hash whatever ID you’re using, and use the first few hex digits as intermediate directories.
For example, say you have item ID 12345, and it hashes to ‘DEADBEEF02842. ‘ . You might store your files under /storage/root/d/e/12345 . You’ve now cut the number of files in each directory by 1/256th.
Optimal number of files per directory vs number of directories for EXT4
What do you mean by large number ? 10^3 ? 10^6 ? 10^9 ? Do you want the files to remain on disk after the script has read them or can you delete them ?
Good question, IMO; could be construed as subjective with no single definitive answer (a bit vague, considering hardware variance etc.) but I think that’s being a bit intolerant to answering the problem, to which there will be valuable answers (and the experts will have an idea of the thresholds to work with to crunch numbers).
As pointed out in the answers to this question — stackoverflow.com/questions/8238860/… — a directory containing a large number of files can be something of a challenge to many utilities such as ls and `rm. This can be ignored if you delete the files as soon as you are done with them.
Do you only want answers for your case (10k files, which is actually not very large these days for modern UNIX filesystems) or do you want general answers about a ‘large number’? Do you want to know the asymptotic big-O behavior as N passes 10^6, 10^9 . (I came here hoping to find that).
1 Answer 1
10k files inside a single folder is not a problem on Ext4. It should have the dir_index option enabled by default, which indexes directories content using a btree-like structure to prevent performance issues.
To sum up, unless you create millions of files or use ext2/ext3, you shouldn’t have to worry about system or FS performance issues.
That being said, shell tools and commands don’t like to be called with a lot of files as parameter ( rm * for example) and may return you an error message saying something like ‘too many arguments’. Look at this answer for what happens then.
Directories with two or more files
I want to find a subdirectory of the current directory, which (that is the subdirectory) contains 2 or more regular files. I am not interested in directories containing less than 2 files, neither in directories which contain only subdirectories.
4 Answers 4
Here is a completely different approach based on GNU find and uniq . This is much faster and much CPU-friendly than answers based on executing a shell command that counts files for each directory found.
find . -type f -printf '%h\n' | sort | uniq -d
The find command prints the directory of all files in the hierarchy and uniq only displays the directories that appear at least twice.
You shouldn’t parse the output of find . In this case, because GNU find will mangle the names of directories that have characters that are not printable in the current locale (like «ä» in the C locale). See also unix.stackexchange.com/questions/321697/…
@Kusalananda, not when the output doesn’t go to a tty. Here, the only problem is with the newline characters, which you can fix by using -printf ‘%h\0’ | sort -z | uniq -zd | xargs -r0 .
With the help of Gilles’s answer on SU and its reverse and some modification, here what you need.
find . -type d -exec sh -c 'set -- "$1"/*;X=0; for args; do [ -f "$args" ] && X=$((X+1)) ;done; [ "$X" -gt 1 ] ' _ <> \; -print
. ├── test │ ├── dir1 │ │ ├── a │ │ ├── b │ │ └── c │ ├── dir2 │ │ ├── dira │ │ │ └── a file\012with\012multiple\012line │ │ ├── dirb │ │ │ ├── file-1 │ │ │ └── file-2 │ │ └── dirc │ ├── diraa │ ├── dirbb │ ├── dircc │ └── x │ └── x1 │ └── x2 └── test2 ├── dir3 └── dir4
./test ./test/dir1 ./test/dir2/dirb
I had this at first too, but you will have problem with directories containing multiple subdirectories and files. It also does not weed out directories only containing subdirectories.
It doesn’t really solve it. It finds both the test and the dir2 directories in my test setup (see my answer).
Works for your example, but add test/x1 and test/x2 as files as well. $1 and $2 will be directories for test , and the directory will be missed.
@Kusalananda No way I found except what you answered, I tried to change some part of my command to don’t be exact duplicate of yours (I didn’t exclude hidden files as you did), my apologize.
find . -type d \ -exec sh -c 'c=0; for n in "$1"/*; do [ -f "$n" ] && [ ! -h "$n" ] && c=$(( c + 1 )); done; [ "$c" -ge 2 ]' sh <> ';' \ -print
This will find all names in or under the current directory and then filter out all names that are not names of directories.
The remaining directory names will be given to this short script:
c=0 for n in "$1"/*; do [ -f "$n" ] && [ ! -h "$n" ] && c=$(( c + 1 )) done [ "$c" -ge 2 ]
This script will count the number of regular files (skipping symbolic links) in the directory given as the first command line argument (from find ). The last command in the script is a test to see if the count was 2 or greater. The result of this test is the return value (exit status) of the script.
If the test succeeded, -print will cause find to print out the path to the directory.
To also consider hidden files (files whose names begins with a dot), change the sh -c script from saying
$ tree . `-- test |-- a |-- dir1 | |-- a | |-- b | `-- c `-- dir2 |-- dira |-- dirb | |-- file-1 | `-- file-2 `-- dirc 6 directories, 6 files $ find . -type d -exec sh -c 'c=0; for n in "$1"/*; do [ -f "$n" ] && [ ! -h "$n" ] && c=$(( c + 1 )); done; [ "$c" -ge 2 ]' sh <> ';' -print ./test/dir1 ./test/dir2/dirb
How many files can I have on a single directory?
This question is related to this one. I work with animation, which generates a LOT of files (+/- 1,000,000) typically stored on a single directory. On Mac Os X, some bugs came up with more than +/-30,000 files, so I used to break the animation into various directories. On Ubuntu, is there a limit for the number of files a single directory can hold?
1 Answer 1
Ubuntu does not limit the size of a directory, it’s imposed by the file system. Each file and directory is an so-called inode. You can use df -i to check the number of inodes in use and available for all mounted filesystems.
I’ve just created 1 million and one files without issues because my inode limit for my ext4 home partition of 50 GB (46 GiB) is large enough.
I used shell expansion for creating the files, combined with the touch utility:
mkdir test cd test touch touch touch touch
This creates 1000001 files which can be verified with ls | wc -l . Why 300000..600000 and not 300001..600000 ? Because I was too lazy to put that 1 at the end.
/dev/sda6 3055616 1133635 1921981 38% /home
Now remove the test files ( cd ..&&rm -f test took much longer, so use rm with the filenames):
and the number of inodes in use decreased immediately after removal of the files:
/dev/sda6 3055616 133634 2921982 5% /home
Note that even if the filesystem allows such large numbers of files, it’s a horrible idea to store such large files in a single directory. At least use some subdirectories with a structure like f/i/l/e/filename.ext . Programs do often not expect such large quantities of files.
Creating 50 directories with 50 files inside
I am trying to create 50 directories (dir-01..dir-50). And I want to create 50 files (01.txt..50.txt) inside each of the 50 directories.
For example: dir-01/01.txt..50.txt dir-02/02.txt..50.txt etc.
I am able to create the the directories, but I am having trouble with creating the files inside each. I am also trying to compress all these afterwards into a tar file. This is where I am at so far:
for i in ; do mkdir dir-$i; done; for j in ; do touch $j.txt.dir-*; done; tar -cf final.tar dir-
I know that second loop is wrong, but I am unsure how to proceed. Any advice is appreciated. This seems to work, but I am unsure if it is correct in syntax or format:
for i in ; do mkdir "dir-$i"; for j in ; do touch "./dir-$i/$j.txt"; done; done; tar -cf final.tar dir-
Do you want to create 50 files total, or 50 files in each directory? Please do not respond in comments; edit your question to make it clearer and more complete.
7 Answers 7
With zsh or bash or yash -o braceexpand :
$ mkdir dir- $ touch dir-/file.txt $ ls dir-45 file01.txt file09.txt file17.txt file25.txt file33.txt file41.txt file49.txt file02.txt file10.txt file18.txt file26.txt file34.txt file42.txt file50.txt file03.txt file11.txt file19.txt file27.txt file35.txt file43.txt file04.txt file12.txt file20.txt file28.txt file36.txt file44.txt file05.txt file13.txt file21.txt file29.txt file37.txt file45.txt file06.txt file14.txt file22.txt file30.txt file38.txt file46.txt file07.txt file15.txt file23.txt file31.txt file39.txt file47.txt file08.txt file16.txt file24.txt file32.txt file40.txt file48.txt $ tar -cf archive.tar dir-
$ mkdir dir- $ touch dir-/file.txt $ tar -cf archive.tar dir-
The ksh93 brace expansion takes a printf() -style format string that can be used to create the zero-filled numbers.
i=0 while [ "$(( i += 1 ))" -le 50 ]; do zi=$( printf '%02d' "$i" ) mkdir "dir-$zi" j=0 while [ "$(( j += 1 ))" -le 50 ]; do zj=$( printf '%02d' "$j" ) touch "dir-$zi/file$zj.txt" done done tar -cf archive.tar dir-* # assuming only the folders we just created exists
An alternative for just creating your tar archive without creating so many files, in bash :
mkdir dir-01 touch dir-01/file.txt tar -cf archive.tar dir-01 for i in ; do mv "dir-$(( i - 1 ))" "dir-$i" tar -uf archive.tar "dir-$i" done
This just creates one of the directories and adds it to the archive. Since all files in all 50 directories are identical in name and contents, it then renames the directory and appends it to the archive in successive iterations to add the other 49 directories.