Exclude a sub-directory using find
There is an incoming sub-folder in all of the folders inside Data directory. I want to get all files from all the folders and sub-folders except the def/incoming and 456/incoming dirs. I tried out with following command
find /home/feeds/data -type d \( -name 'def/incoming' -o -name '456/incoming' -o -name arkona \) -prune -o -name '*.*' -print
This is not good advice, but it will get you out of a lot of situations quick and dirty: pipe that to grep -v something to exclude whatever it is you don’t want
6 Answers 6
find /home/feeds/data -type f -not -path "*def/incoming*" -not -path "*456/incoming*"
Explanation:
- find /home/feeds/data : start finding recursively from specified path
- -type f : find files only
- -not -path «*def/incoming*» : don’t include anything with def/incoming as part of its path
- -not -path «*456/incoming*» : don’t include anything with 456/incoming as part of its path
@Ravi are you using bash shell? I just tested this on my terminal and it works for me. Try copy and pasting the solution instead if you made modifications to your script.
-path matches the whole string, so if you’re doing find . , then your -path strings need to be ./path/to/directory/*
FYI -not -path definitely will work in this example but find is still iterating into the directory structure and using cpu cycles to iterate over all those directories/files. to prevent find from iterating over those directories/files (maybe there are millions of files there) then you need to use -prune (the -prune option is difficult to use however).
Just for the sake of documentation: You might have to dig deeper as there are many search’n’skip constellations (like I had to). It might turn out that prune is your friend while -not -path won’t do what you expect.
So this is a valuable example of 15 find examples that exclude directories:
To link to the initial question, excluding finally worked for me like this:
find . -regex-type posix-extended -regex ".*def/incoming.*|.*456/incoming.*" -prune -o -print
Then, if you wish to find one file and still exclude pathes, just add | grep myFile.txt .
It may depend also on your find version. I see:
$ find -version GNU find version 4.2.27 Features enabled: D_TYPE O_NOFOLLOW(enabled) LEAF_OPTIMISATION SELINUX
How to ignore certain filenames using «find»?
which searches the contents of all of the files at and below the current directory for the specified SearchString. As a developer, this has come in handy at times. Due to my current project, and the structure of my codebase, however, I’d like to make this BASH command even more advanced by not searching any files that are in or below a directory that contains «.svn», or any files that end with «.html» The MAN page for find kind of confused me though. I tried using -prune, and it gave me strange behavior. In an attempt to skip only the .html pages (to start), I tried :
find . -wholename './*.html' -prune -exec grep 'SearchString' <> /dev/null \;
and did not get the behavior I was hoping for. I think I might be missing the point of -prune. Could you guys help me out? Thanks
@emanuele Hi, welcome to SuperUser (and the Stack Exchange network). This is a question I asked, and that was answered, 2 1/2 years ago. Typically, if you would like to add an answer to the question, please do so by scrolling to the bottom and answering there, instead of in a comment. Since this question already has an accepted answer (the one with the green checkmark), it’s unlikely that your answer is going to get much attention, however. FYI.
Hi, it is not an answer to your question. It is only a tip, as you stated in preamble that use find to search inside a file.
FWIW, -name ‘*.*’ does not find all files: only those with a . in their name (the use of *.* is typically an DOS-ism, whereas in Unix, you normally use just * for that). To really match them all, just remove the argument altogether: find . -exec . . Or if you want to only apply grep to files (and skip directories) then do find . -type f -exec . .
5 Answers 5
You can use the negate (!) feature of find to not match files with specific names:
find . ! -name '*.html' ! -path '*.svn*' -exec grep 'SearchString' <> /dev/null \;
So if the name ends in .html or contains .svn anywhere in the path, it will not match, and so the exec will not be executed.
@Paul The desired effect is to exclude «files that are in or below a directory that contains .svn «, so path (or wholename , but path is more portable) is more accurate than name for the answer. They questioner doesn’t appear to have any files with .svn in the name.
I’ve had the same issue for a long time, and there are several solutions which can be applicable in different situations:
- ack-grep is a sort of «developer’s grep » which by default skips version control directories and temporary files. The man page explains how to search only specific file types and how to define your own.
- grep ‘s own —exclude and —exclude-dir options can be used very easily to skip file globs and single directories (no globbing for directories, unfortunately).
- find . \( -type d -name ‘.svn’ -o -type f -name ‘*.html’ \) -prune -o -print0 | xargs -0 grep . should work, but the above options are probably less of a hassle in the long run.
The following find command does prune directories whose names contain .svn , Although it does not descend into the directory, the pruned path name is printed . ( -name ‘*.svn’ is the cause!) ..
You can filter out the directory names via: grep -d skip which silently skips such input «directory names».
With GNU grep, you can use -H instead of /dev/null . As a slight side issue: \+ can be much faster than \; , eg. for 1 million one-line files, using \; it took 4m20s, using \+ it took only 1.2s.
The following method uses xargs instead of -exec , and assumes there are no newlines \n in any of your file names. As used here, xargs is much the same as find’s \+ .
xargs can pass file-names which contain consecutive spaces by changing the input delimiter to ‘\n’ with the -d option.
This excludes directories whose names contain .svn and greps only files which don’t end with .html .
find . \( -name '*.svn*' -prune -o ! -name '*.html' \) | xargs -d '\n' grep -Hd skip 'SearchString'
Exclude list of files from find
If I have a list of filenames in a text file that I want to exclude when I run find , how can I do that? For example, I want to do something like:
find /dir -name "*.gz" -exclude_from skip_files
and get all the .gz files in /dir except for the files listed in skip_files. But find has no -exclude_from flag. How can I skip all the files in skip_files ?
7 Answers 7
I don’t think find has an option like this, you could build a command using printf and your exclude list:
find /dir -name "*.gz" $(printf "! -name %s " $(cat skip_files))
Which is the same as doing:
find /dir -name "*.gz" ! -name first_skip ! -name second_skip . etc
Alternatively you can pipe from find into grep :
find /dir -name "*.gz" | grep -vFf skip_files
Does this work for inclusions as well? I just tested and I got nothing: included_paths=(«./.aws/*» «./.bash_env.m4») && find . -type f -name ‘*.m4’ \( -path «$» $(printf » -or -path ‘%s'» «$») \) . When I directly put in the paths into the find command, it does work.
This is what i usually do to remove some files from the result (In this case i looked for all text files but wasn’t interested in a bunch of valgrind memcheck reports we have here and there):
find . -type f -name '*.txt' ! -name '*mem*.txt'
Example for if you need to ignore multiple filenames / patterns: find . -type f ! -name ‘.foo’ ! -name ‘.bar’ .
find /dir \( -name "*.gz" ! -name skip_file1 ! -name skip_file2 . so on \)
find /var/www/test/ -type f \( -iname "*.*" ! -iname "*.php" ! -iname "*.jpg" ! -iname "*.png" \)
The above command gives list of all files excluding files with .php, .jpg ang .png extension. This command works for me in putty.
PuTTY is a remote terminal (usually using SSH) — whether it works will very much depend on what you’re SSH’ing into.
Josh Jolly’s grep solution works, but has O(N**2) complexity, making it too slow for long lists. If the lists are sorted first (O(N*log(N)) complexity), you can use comm , which has O(N) complexity:
find /dir -name '*.gz' |sort >everything_sorted sort skip_files >skip_files_sorted comm -23 everything_sorted skip_files_sorted | xargs . . . etc
man your computer’s comm for details.
This solution will go through all files (not exactly excluding from the find command), but will produce an output skipping files from a list of exclusions. I found that useful while running a time-consuming command ( file /dir -exec md5sum <> \; ).
- You can create a shell script to handle the skipping logic and run commands on the files found (make it executable with chmod , replace echo with other commands):
$ cat skip_file.sh #!/bin/bash found=$(grep "^$1$" files_to_skip.txt) if [ -z "$found" ]; then # run your command echo $1 fi
- Create a file with the list of files to skip named files_to_skip.txt (on the dir you are running from).
- Then use find using it:
find /dir -name "*.gz" -exec ./skip_file.sh <> \;
Working out
this will fail if any of the filenames has a space in it that is unquoted (which it must be if sharing the exclusion list with another utility that expect it that way.)