Linux find text in folders

Linux command: How to ‘find’ only text files?

which is very unhandy and outputs unneeded texts such as mime type information. Any better solutions? I have lots of images and other binary files in the same folder with a lot of text files that I need to search through.

16 Answers 16

I know this is an old thread, but I stumbled across it and thought I’d share my method which I have found to be a very fast way to use find to find only non-binary files:

find . -type f -exec grep -Iq . <> \; -print 

The -I option to grep tells it to immediately ignore binary files and the . option along with the -q will make it immediately match text files so it goes very fast. You can change the -print to a -print0 for piping into an xargs -0 or something if you are concerned about spaces (thanks for the tip, @lucas.werkmeister!)

Also the first dot is only necessary for certain BSD versions of find such as on OS X, but it doesn’t hurt anything just having it there all the time if you want to put this in an alias or something.

EDIT: As @ruslan correctly pointed out, the -and can be omitted since it is implied.

This is better than peoro’s answer because 1. it actually answers the question 2. It does not yield false positives 3. it is way more performant

You can also use find -type f -exec grep -Iq . <> \; -and -print which has the advantage that it keeps the files in find ; you can substitute -print with another -exec that is only run for text files. (If you let grep print the file names, you won’t be able to distinguish file names with newlines in them.)

@NathanS.Watson-Haigh It shouldn’t, because it should be matching text files immediately. Do you have a specific use case you can share?

find . -type f -exec grep -Il . <> + is much faster. Drawback is that it cannot be extended by another -exec as @lucas.werkmeister suggested

grep -rIl «needle text» my_folder

Why is it unhandy? If you need to use it often, and don’t want to type it every time just define a bash function for it:

function findTextInAsciiFiles < # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT find "$1" -type f -exec grep -l "$2" <>\; -exec file <> \; | grep text > 

put it in your .bashrc and then just run:

findTextInAsciiFiles your_folder "needle text" 

EDIT to reflect OP’s edit:

if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before : : cut -d’:’ -f1 :

function findTextInAsciiFiles < # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT find "$1" -type f -exec grep -l "$2" <>\; -exec file <> \; | grep text | cut -d ':' -f1 > 

I’m not sure if «grep text» is accurate enough to get exactly all text files — I mean, is there any text file types that have no ‘text’ in the string of its mime type description?

Читайте также:  Linux create file with space

@kavoir.com: yes. From file manual: «Users depend on knowing that all the readable files in a directory have the word ‘text’ printed.»

Wouldn’t it be a bit more clever to search for text files before grepping, instead of grepping and then filtering out text files?

/proc/meminfo , /proc/cpuinfo etc. are text files, but file /proc/meminfo says /proc/meminfo: empty . I wonder if ’empty’ should be tested in addition to ‘text’, but not sure if also other types could report ’empty’.

find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search" 

This is unfortunately not space save. Putting this into bash script makes it a bit easier.

#!/bin/bash #if [ ! "$1" ] ; then echo "Usage: $0 "; exit fi find . -type f -print0 \ | xargs -0 file \ | grep -P text \ | cut -d: -f1 \ | xargs -i% grep -Pil "$1" "%" 

There are a couple of issues in your script: 1. what if a binary file is named text.bin ? 2. What if a filename contains a : ?

Another way of doing this:

# find . |xargs file <> \; |grep "ASCII text" 

If you want empty files too:

# find . |xargs file <> \; |egrep "ASCII text|empty" 
$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' 

If you want the filenames without the file types, just add a final sed filter.

$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||' 

You can filter-out unneeded file types by adding more -e ‘type’ options to the last grep command.

If your xargs version supports the -d option, the commands above become simpler:

$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||' 

silly me. Didn’t notice recursive grep. as I understood it’s actually quite fast even though a bit limited in many applications. +1 for you.

1 . make a small script to test if a file is plain text istext:

find . -type f -exec istext <> \; -exec grep -nHi mystring <> \; 

Here’s a simplified version with extended explanation for beginners like me who are trying to learn how to put more than one command in one line.

If you were to write out the problem in steps, it would look like this:

// For every file in this directory // Check the filetype // If it's an ASCII file, then print out the filename 

To achieve this, we can use three UNIX commands: find , file , and grep .

find will check every file in the directory.

file will give us the filetype. In our case, we’re looking for a return of ‘ASCII text’

grep will look for the keyword ‘ASCII’ in the output from file

So how can we string these together in a single line? There are multiple ways to do it, but I find that doing it in order of our pseudo-code makes the most sense (especially to a beginner like me).

find ./ -exec file <> «;» | grep ‘ASCII’

Looks complicated, but not bad when we break it down:

find ./ = look through every file in this directory. The find command prints out the filename of any file that matches the ‘expression’, or whatever comes after the path, which in our case is the current directory or ./

Читайте также:  Бэкап операционной системы linux

The most important thing to understand is that everything after that first bit is going to be evaluated as either True or False. If True, the file name will get printed out. If not, then the command moves on.

-exec = this flag is an option within the find command that allows us to use the result of some other command as the search expression. It’s like calling a function within a function.

file <> = the command being called inside of find . The file command returns a string that tells you the filetype of a file. Regularly, it would look like this: file mytextfile.txt . In our case, we want it to use whatever file is being looked at by the find command, so we put in the curly braces <> to act as an empty variable, or parameter. In other words, we’re just asking for the system to output a string for every file in the directory.

«;» = this is required by find and is the punctuation mark at the end of our -exec command. See the manual for ‘find’ for more explanation if you need it by running man find .

| grep ‘ASCII’ = | is a pipe. Pipe take the output of whatever is on the left and uses it as input to whatever is on the right. It takes the output of the find command (a string that is the filetype of a single file) and tests it to see if it contains the string ‘ASCII’ . If it does, it returns true.

NOW, the expression to the right of find ./ will return true when the grep command returns true. Voila.

Источник

How can I use grep to find a word inside a folder?

In Windows, I would have done a search for finding a word inside a folder. Similarly, I want to know if a specific word occurs inside a directory containing many sub-directories and files. My searches for grep syntax shows I must specify the filename, i.e. grep string filename . Now, I do not know the filename, so what do I do? A friend suggested to do grep -nr string , but I don’t know what this means and I got no results with it (there is no response until I issue a Ctrl + C ).

14 Answers 14

The dot at the end searches the current directory. Meaning for each parameter:

-n Show relative line number in the file 'yourString*' String for search, followed by a wildcard character -r Recursively search subdirectories listed . Directory for search (current directory) 

grep -nr ‘MobileAppSer*’ . (Would find MobileAppServlet.java or MobileAppServlet.class or MobileAppServlet.txt ; ‘MobileAppASer*.*’ is another way to do the same thing.)

To check more parameters use man grep command.

What’s the business with * ? It will either result in shell wildcard expansion (if there are filenames matching the wildcard pattern), or grep will take it as 0-or-more repetition operator for the character preceding * .

Now let’s consider both possibilities for grep -nr MobileAppSer* . 1. Assume we have 3 files in the current directory matching MobileAppSer* wildcard pattern: named MobileAppServlet.java , MobileAppServlet.class , MobileAppServlet.txt . Then grep will be invoked like this: grep -nr MobileAppServlet.class MobileAppServlet.java MobileAppServlet.txt . . It means search for text «MobileAppServlet.class» in files MobileAppServlet.java, MobileAppServlet.txt, and elsewhere in the current directory — which surely isn’t what the user wants here.

Читайте также:  Kali linux network manager is not running

2. In case there are no files in the current directory matching the MobileAppSer* wildcard pattern, grep will receive the argument MobileAppSer* as-is and thus will take it as search for text «MobileAppSe» followed by 0 or more occurrences of «r», so it will attempt to find texts «MobileAppSe», «MobileAppSer», «MobileAppSerr», «MobileAppSerrr», etc. in current directory’s files contents — not what the user wants either.

I ran grep -nr ‘yourString*’ . and got some files with «binary file matches». You can add —text or -a to prevent this: grep -anr ‘yourString*’ .

grep -nr string my_directory 

Additional notes: this satisfies the syntax grep [options] string filename because in Unix-like systems, a directory is a kind of file (there is a term «regular file» to specifically refer to entities that are called just «files» in Windows).

grep -nr string reads the content to search from the standard input, that is why it just waits there for input from you, and stops doing so when you press ^C (it would stop on ^D as well, which is the key combination for end-of-file).

Hey, so if i want to search for a string irrespective of the case, must I do this: grep -i -nr «my word» .

@kiki: -r for grep means search in subdirectories recursively and -n means prefix each line of output with the corresponding line number of the file which contains that line. man grep describes all of this, and much more.

GREP: Global Regular Expression Print/Parser/Processor/Program.
You can use this to search the current directory.
You can specify -R for «recursive», which means the program searches in all subfolders, and their subfolders, and their subfolder’s subfolders, etc.

-n will print the line number, where it matched in the file.
-i will search case-insensitive (capital/non-capital letters).

grep -inR "your regex pattern" . 

Thanks. And grep -inR «[0-9a-fA-F]<32>» . helps find hashes (which are hex strings) in the files within the current directory. stackoverflow.com/a/25724915/470749

find directory_name -type f -print0 | xargs -0 grep -li word 

but that might be a bit much for a beginner.

find is a general purpose directory walker/lister, -type f means «look for plain files rather than directories and named pipes and what have you», -print0 means «print them on the standard output using null characters as delimiters». The output from find is sent to xargs -0 and that grabs its standard input in chunks (to avoid command line length limitations) using null characters as a record separator (rather than the standard newline) and then applies grep -li word to each set of files. On the grep , -l means «list the files that match» and -i means «case insensitive»; you can usually combine single character options so you’ll see -li more often than -l -i .

If you don’t use -print0 and -0 then you’ll run into problems with file names that contain spaces so using them is a good habit.

Источник

Оцените статью
Adblock
detector