- Linux command: How to ‘find’ only text files?
- 16 Answers 16
- Grep Command Tutorial – How to Search for a File in Linux and Unix with Recursive Find
- How to use grep
- Options used with grep
- 1. -n (—line-number) — list line numbers
- 2. -c (—count) — prints the number of lines of matches
- 3. -v (—invert-match) — prints the lines that do not match the specified pattern
- 4. -i (—ignore-case) — used for case insensitivity
- 5. -l (—files-with-matches) — print file names that match a pattern
- 7. -o (—only-matching) — print only the matched pattern
- 8. -A (—after-context) and -B (—before-context) — print the lines after and before (respectively) the matched pattern
- 9. -R (—dereference-recursive) — recursive search
- Regular expressions for patterns
- 1. ^pattern — start of a line
- 2. pattern$ — end of a line
- Wrap up
Linux command: How to ‘find’ only text files?
which is very unhandy and outputs unneeded texts such as mime type information. Any better solutions? I have lots of images and other binary files in the same folder with a lot of text files that I need to search through.
16 Answers 16
I know this is an old thread, but I stumbled across it and thought I’d share my method which I have found to be a very fast way to use find to find only non-binary files:
find . -type f -exec grep -Iq . <> \; -print
The -I option to grep tells it to immediately ignore binary files and the . option along with the -q will make it immediately match text files so it goes very fast. You can change the -print to a -print0 for piping into an xargs -0 or something if you are concerned about spaces (thanks for the tip, @lucas.werkmeister!)
Also the first dot is only necessary for certain BSD versions of find such as on OS X, but it doesn’t hurt anything just having it there all the time if you want to put this in an alias or something.
EDIT: As @ruslan correctly pointed out, the -and can be omitted since it is implied.
This is better than peoro’s answer because 1. it actually answers the question 2. It does not yield false positives 3. it is way more performant
You can also use find -type f -exec grep -Iq . <> \; -and -print which has the advantage that it keeps the files in find ; you can substitute -print with another -exec that is only run for text files. (If you let grep print the file names, you won’t be able to distinguish file names with newlines in them.)
@NathanS.Watson-Haigh It shouldn’t, because it should be matching text files immediately. Do you have a specific use case you can share?
find . -type f -exec grep -Il . <> + is much faster. Drawback is that it cannot be extended by another -exec as @lucas.werkmeister suggested
grep -rIl «needle text» my_folder
Why is it unhandy? If you need to use it often, and don’t want to type it every time just define a bash function for it:
function findTextInAsciiFiles < # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT find "$1" -type f -exec grep -l "$2" <>\; -exec file <> \; | grep text >
put it in your .bashrc and then just run:
findTextInAsciiFiles your_folder "needle text"
EDIT to reflect OP’s edit:
if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before : : cut -d’:’ -f1 :
function findTextInAsciiFiles < # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT find "$1" -type f -exec grep -l "$2" <>\; -exec file <> \; | grep text | cut -d ':' -f1 >
I’m not sure if «grep text» is accurate enough to get exactly all text files — I mean, is there any text file types that have no ‘text’ in the string of its mime type description?
@kavoir.com: yes. From file manual: «Users depend on knowing that all the readable files in a directory have the word ‘text’ printed.»
Wouldn’t it be a bit more clever to search for text files before grepping, instead of grepping and then filtering out text files?
/proc/meminfo , /proc/cpuinfo etc. are text files, but file /proc/meminfo says /proc/meminfo: empty . I wonder if ’empty’ should be tested in addition to ‘text’, but not sure if also other types could report ’empty’.
find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"
This is unfortunately not space save. Putting this into bash script makes it a bit easier.
#!/bin/bash #if [ ! "$1" ] ; then echo "Usage: $0 "; exit fi find . -type f -print0 \ | xargs -0 file \ | grep -P text \ | cut -d: -f1 \ | xargs -i% grep -Pil "$1" "%"
There are a couple of issues in your script: 1. what if a binary file is named text.bin ? 2. What if a filename contains a : ?
Another way of doing this:
# find . |xargs file <> \; |grep "ASCII text"
If you want empty files too:
# find . |xargs file <> \; |egrep "ASCII text|empty"
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'
If you want the filenames without the file types, just add a final sed filter.
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
You can filter-out unneeded file types by adding more -e ‘type’ options to the last grep command.
If your xargs version supports the -d option, the commands above become simpler:
$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
silly me. Didn’t notice recursive grep. as I understood it’s actually quite fast even though a bit limited in many applications. +1 for you.
1 . make a small script to test if a file is plain text istext:
find . -type f -exec istext <> \; -exec grep -nHi mystring <> \;
Here’s a simplified version with extended explanation for beginners like me who are trying to learn how to put more than one command in one line.
If you were to write out the problem in steps, it would look like this:
// For every file in this directory // Check the filetype // If it's an ASCII file, then print out the filename
To achieve this, we can use three UNIX commands: find , file , and grep .
find will check every file in the directory.
file will give us the filetype. In our case, we’re looking for a return of ‘ASCII text’
grep will look for the keyword ‘ASCII’ in the output from file
So how can we string these together in a single line? There are multiple ways to do it, but I find that doing it in order of our pseudo-code makes the most sense (especially to a beginner like me).
find ./ -exec file <> «;» | grep ‘ASCII’
Looks complicated, but not bad when we break it down:
find ./ = look through every file in this directory. The find command prints out the filename of any file that matches the ‘expression’, or whatever comes after the path, which in our case is the current directory or ./
The most important thing to understand is that everything after that first bit is going to be evaluated as either True or False. If True, the file name will get printed out. If not, then the command moves on.
-exec = this flag is an option within the find command that allows us to use the result of some other command as the search expression. It’s like calling a function within a function.
file <> = the command being called inside of find . The file command returns a string that tells you the filetype of a file. Regularly, it would look like this: file mytextfile.txt . In our case, we want it to use whatever file is being looked at by the find command, so we put in the curly braces <> to act as an empty variable, or parameter. In other words, we’re just asking for the system to output a string for every file in the directory.
«;» = this is required by find and is the punctuation mark at the end of our -exec command. See the manual for ‘find’ for more explanation if you need it by running man find .
| grep ‘ASCII’ = | is a pipe. Pipe take the output of whatever is on the left and uses it as input to whatever is on the right. It takes the output of the find command (a string that is the filetype of a single file) and tests it to see if it contains the string ‘ASCII’ . If it does, it returns true.
NOW, the expression to the right of find ./ will return true when the grep command returns true. Voila.
Grep Command Tutorial – How to Search for a File in Linux and Unix with Recursive Find
Dillion Megida
grep stands for Globally Search For Regular Expression and Print out. It is a command line tool used in UNIX and Linux systems to search a specified pattern in a file or group of files.
grep comes with a lot of options which allow us to perform various search-related actions on files. In this article, we’ll look at how to use grep with the options available as well as basic regular expressions to search files.
How to use grep
Without passing any option, grep can be used to search for a pattern in a file or group of files. The syntax is:
Note that single or double quotes are required around the text if it is more than one word.
You can also use the wildcard (*) to select all files in a directory.
The result of this is the occurences of the pattern (by the line it is found) in the file(s). If there is no match, no output will be printed to the terminal.
For example, say we have the following files (called grep.txt):
Hello, how are you I am grep Nice to meet you
The following grep command will search for all occurences of the word ‘you’:
Hello, how are you Nice to meet you
you is expected to have a different color than the other text to easily identify what was searched for.
But grep comes with more options which help us achieve more during a search operation. Let’s look at nine of them while applying them to the example above.
Options used with grep
1. -n (—line-number) — list line numbers
This prints out the matches for the text along with the line numbers. If you look at the result we have above, you’ll notice there are no line numbers, just the matches.
1: Hello, how are you 3: Nice to meet you
2. -c (—count) — prints the number of lines of matches
Note that if there was another ‘you’ on line one, option -c would still print 2. This is because it is concerned with the number of lines where the matches appear, not the number of matches.
3. -v (—invert-match) — prints the lines that do not match the specified pattern
Notice that we also used option -n ? Yes, you can apply multiple options in one command.
4. -i (—ignore-case) — used for case insensitivity
# command 1 grep You grep.txt # command 2 grep YoU grep.txt -i
# result 1 # no result # result 2 Hello, how are you Nice to meet you
5. -l (—files-with-matches) — print file names that match a pattern
# command 1 grep you grep.txt -l # command 2 grep You grep.txt -i -l
# result 1 grep.txt # result 2 # all files in the current directory that matches # the text 'You' case insensitively
#### 6. `-w` (--word-regexp) - print matches of the whole word
By default, grep matches strings which contain the specified pattern. This means that grep yo grep.txt will print the same results as grep yo grep.txt because ‘yo’ can be found in you. Similarly, ‘ou’.
With the option -w , grep ensures that the matches are exactly the same pattern as specified. Example:
7. -o (—only-matching) — print only the matched pattern
By default, grep prints the line where the matched pattern is found. With option -o , only the matched pattern is printed line by line. Example:
8. -A (—after-context) and -B (—before-context) — print the lines after and before (respectively) the matched pattern
grep grep grep.txt -A 1 -B 1
Hello, how are you I am grep Nice to meet you
This matched pattern is on line 2. -A 1 means one line after the matched line and -B 1 means one line before the matched line.
There’s also a -C (—context) option which is equal to -A + -B . The value passed to -C would be used for -A and -B .
9. -R (—dereference-recursive) — recursive search
By default, grep cannot search directories. If you try doing so, you’ll get an error («Is a directory»). With option -R , searching files within directories and subdirectories becomes possible. Example:
# 'you' matches in a folders # and files starting from the # current directory
Regular expressions for patterns
grep also allows basic regular expressions for specifying patterns. Two of them are:
1. ^pattern — start of a line
This pattern means that the grep will match the strings whose lines begin with the string specified after ^ . Example:
2. pattern$ — end of a line
In contrast with ^ , $ specifies patterns that will be matched if the line ends with the string before $ . Example:
1: Hello, how are you 3: Nice to meet you
Wrap up
grep is a powerful tool for searching files in the terminal. Understanding how to use it gives you the ability to easily find files via the terminal.
There are more options attached to this tool. You can find with man grep .
Dillion Megida
Developer Advocate and Content Creator passionate about sharing my knowledge on Tech. I simplify JavaScript / ReactJS / NodeJS / Frameworks / TypeScript / et al My YT channel: youtube.com/c/deeecode
If you read this far, tweet to the author to show them you care. Tweet a thanks
Learn to code for free. freeCodeCamp’s open source curriculum has helped more than 40,000 people get jobs as developers. Get started
freeCodeCamp is a donor-supported tax-exempt 501(c)(3) charity organization (United States Federal Tax Identification Number: 82-0779546)
Our mission: to help people learn to code for free. We accomplish this by creating thousands of videos, articles, and interactive coding lessons — all freely available to the public. We also have thousands of freeCodeCamp study groups around the world.
Donations to freeCodeCamp go toward our education initiatives, and help pay for servers, services, and staff.