Linux grep multiple patterns

Содержание

How to run grep with multiple AND patterns?
9 Answers 9
git grep
ripgrep
How do I grep for multiple patterns with pattern having a pipe character?
13 Answers 13

How to run grep with multiple AND patterns?

I would like to get the multi pattern match with implicit AND between patterns, i.e. equivalent to running several greps in a sequence:

grep pattern1 | grep pattern2 | .

grep pattern1 & pattern2 & pattern3

I would like to use single grep because I am building arguments dynamically, so everything has to fit in one string. Using filter is system feature, not grep, so it is not an argument for it. Don’t confuse this question with:

This is an OR multi pattern match. I am looking for an AND pattern match.

If you’re looking for the grep syntax for «find lines that contain foo and lines that contain bar » see using grep for multiple search patterns

9 Answers 9

To find the lines that match each and everyone of a list of patterns, agrep (the original one, now shipped with glimpse, not the unrelated one in the TRE regexp library) can do it with this syntax:

With GNU grep , when built with PCRE support, you can do:

(adding .* s as & matches strings that match both and exactly, a&b would never match as there’s no such string that can be both a and b at the same time).

If the patterns don’t overlap, you may also be able to do:

grep -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'

The best portable way is probably with awk as already mentioned:

sed -e '/pattern1/!d' -e '/pattern2/!d'

perl -ne 'print if /pattern1/ && /pattern2/'

Please beware that all those will have different regular expression syntaxes.

@Techiee, or just awk ‘/p1/ && /p2/ ; END ‘

@ChamindaBandara, you ran that with GNU grep instead of ast grep . GNU grep has no support for ast augmented regexp. It does have an undocumented -X option, but that’s for something unrelated, it’s to specify the regexp flavour (matcher) like in grep -X perl being the same as grep -P .

@DanielKaplan, from your recent question, I suspect you’re looking for something difference from what this Q&A is about. Here we’re trying to find lines that match all patterns, while you may be trying to find files for which all patterns are matched by any line (there are several Q&As here covering that). I’ve edited the answer to maybe make that more obvious.

You didn’t specify grep version, this is important. Some regexp engines allow multiple matching groupped by AND using ‘&’ but this is non-standard and non-portable feature. But, at least GNU grep doesn’t support this.

OTOH you can simply replace grep with sed, awk, perl, etc. (listed in order of weight increasing). With awk, the command would look like

awk '/regexp1/ && /regexp2/ && /regexp3/ < print; >'

and it can be constructed to be specified in command line in easy way.

Just remember that awk uses ERE’s, e.g. the equivalent of grep -E , as opposed to the BRE’s that plain grep uses.

awk ‘s regexes are called EREs, but in fact they’re a bit idiosyncratic. Here are probably more details than anyone cares for: wiki.alpinelinux.org/wiki/Regex

Thank you, grep 2.7.3 (openSUSE). I upvoted you, but I will keep question open for a while, maybe there is some trick for grep (not that I dislike awk — simply knowing more is better).

grep pattern1 | grep pattern2 | .

I would like to use single grep because I am building arguments dynamically, so everything has to fit in one string

It’s actually possible to build the pipeline dynamically (without resorting to eval ):

# Executes: grep "$1" | grep "$2" | grep "$3" | . function chained-grep < local pattern="$1" if [[ -z "$pattern" ]]; then cat return fi shift grep -- "$pattern" | chained-grep "$@" >cat something | chained-grep all patterns must match order but matter dont

It’s probably not a very efficient solution though.

Use either chained-grep() or function chained-grep but not function chained-grep() : unix.stackexchange.com/questions/73750/…

Can you describe what the trick is? Can you add it to the answer (without «Edit:», «Update:», or similar ) by editing it?

The important part here is that shell allows recursion which makes this possible. Note the keyword local in front of variable that must be unique for the recursion. Also note that keyword local is not POSIX so using shebang #!/bin/sh may not be safe, see details here: unix.stackexchange.com/a/493743/20336

git grep

Here is the syntax using git grep combining multiple patterns using Boolean expressions:

git grep --no-index -e pattern1 --and -e pattern2 --and -e pattern3

The above command will print lines matching all the patterns at once.

—no-index Search files in the current directory that is not managed by Git.

Check man git-grep for help.

This will only match a file if a line is found that matches all patterns, not files where the patterns match individual lines. Use —all-match and —or instead of —and for that.

If patterns contains one pattern per line, you can do something like this:

Or this matches substrings instead of regular expressions:

To print all instead of no lines of the input in the case that patterns is empty, replace NR==FNR with FILENAME==ARGV[1] , or with ARGIND==1 in gawk .

These functions print the lines of STDIN which contain each string specified as an argument as a substring. ga stands for grep all and gai ignores case.

Here’s my take, and this works for words in multiple lines:

Use find . -type f followed by as many
-exec grep -q ‘first_word’ <> \;
and the last keyword with
-exec grep -l ‘nth_word’ <> \;

-q quiet / silent
-l show files with matches

The following returns list of filenames with words ‘rabbit’ and ‘hole’ in them:
find . -type f -exec grep -q ‘rabbit’ <> \; -exec grep -l ‘hole’ <> \;

If you look carefully, you just might learn that this is not the functionality that the question is asking for.

to search multiple files for the presence of two patterns anywhere in the file use

awk -v RS="" '/patern1/&&/patern2/' file1 . filen

Grep is all too often used where (IMO) awk would be better. I like this answer for exactly that reason, and of course awk can do further processing such as printing only fields 6 and 2 from the input.

This doesn’t actually address the OP’s question, but +1 ’cause I think it’s very useful for other related situations & reveals the strength of awk . if you had to choose awk or grep , I think it’s clear. Fortunately we don’t have to make this choice 🙂

ripgrep

Here is the example using rg :

rg -N '(?P.*pattern1.*)(?P.*pattern2.*)(?P.*pattern3.*)' file.txt

It’s one of the quickest grepping tools, since it’s built on top of Rust’s regex engine which uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.

See also related feature request at GH-875.

To find all of the words (or patterns), you can run grep in a for loop. The main advantage here is searching from a list of regular expressions.

# File 'search_all_regex_and_error_if_missing.sh' find_list="\ ^a+$ \ ^b+$ \ ^h+$ \ ^d+$ \ " for item in $find_list; do if grep -E "$item" file_to_search_within.txt then echo "$item found in file." else echo "Error: $item not found in file. Exiting!" exit 1 fi done

Now let’s run it on this file:

hhhhhhhhhh aaaaaaa bbbbbbbbb ababbabaabbaaa ccccccc dsfsdf bbbb cccdd aa caa

$ ./search_all_regex_and_error_if_missing.sh aaaaaaa aa ^a+$ found in file. bbbbbbbbb bbbb ^b+$ found in file. hhhhhhhhhh ^h+$ found in file. Error: ^d+$ not found in file. Exiting!

Источник

How do I grep for multiple patterns with pattern having a pipe character?

but the shell interprets the | as a pipe and complains when bar isn’t an executable. How can I grep for multiple patterns in the same set of files?

13 Answers 13

First, you need to protect the pattern from expansion by the shell. The easiest way to do that is to put single quotes around it. Single quotes prevent expansion of anything between them (including backslashes); the only thing you can’t do then is have single quotes in the pattern.

(also note the — end-of-option-marker to stop some grep implementations including GNU grep from treating a file called -foo-.txt for instance (that would be expanded by the shell from *.txt ) to be taken as an option (even though it follows a non-option argument here)).

If you do need a single quote, you can write it as ‘\» (end string literal, literal quote, open string literal).

Second, grep supports at least¹ two syntaxes for patterns. The old, default syntax (basic regular expressions) doesn’t support the alternation ( | ) operator, though some versions have it as an extension, but written with a backslash.

The portable way is to use the newer syntax, extended regular expressions. You need to pass the -E option to grep to select it (formerly that was done with the egrep separate command²)

Another possibility when you’re just looking for any of several patterns (as opposed to building a complex pattern using disjunction) is to pass multiple patterns to grep . You can do this by preceding each pattern with the -e option.

Or put patterns on several lines:

Or store those patterns in a file, one per line and run

Note that if *.txt expands to a single file, grep won’t prefix matching lines with its name like it does when there are more than one file. To work around that, with some grep implementations like GNU grep , you can use the -H option, or with any implementation, you can pass /dev/null as an extra argument.

¹ some grep implementations support even more like perl-compatible ones with -P , or augmented ones with -X , -K for ksh wildcards.

² while egrep has been deprecated by POSIX and is sometimes no longer found on some systems, on some other systems like Solaris when the POSIX or GNU utilities have not been installed, then egrep is your only option as its /bin/grep supports none of -e , -f , -E , \| or multi-line patterns

As a sidenote — when the patterns are fixed, you should really get into the habit of fgrep or grep -F , for small patterns the difference will be negligible but as they get longer, the benefits start to show.

@TC1 Whether grep -F has an actual performance benefit depends on the grep implementation: some of them apply the same algorithm anyway, so that -F makes a difference only to the time spent parsing the pattern and not to the time searching. GNU grep isn’t faster with -F , for example (it also has a bug that makes grep -F slower in multibyte locales — the same constant pattern with grep is actually significantly faster!). On the other hand BusyBox grep does benefit a lot from -F on large files.

Perhaps it should be mentioned that for more complicated patterns where alternation is only to be for a part of the regular expression, it can be grouped with «\(» and «\)» (the escaping is for the default «basic regular expressions») (?).

Note that egrep predates grep -E . It is not GNU specific (it certainly has nothing to do with Linux). Actually, you’ll still find systems like Solaris where the default grep still doesn’t support -E .

grep "foo\|bar" *.txt grep -E "foo|bar" *.txt

selectively citing the man page of gnu-grep:

 -E, --extended-regexp Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.) Matching Control -e PATTERN, --regexp=PATTERN Use PATTERN as the pattern. This can be used to specify multiple search patterns, or to protect a pattern beginning with a hyphen (-). (-e is specified by POSIX.)

 grep understands two different versions of regular expression syntax: “basic” and “extended.” In GNU grep, there is no difference in available functionality using either syntax. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards.

In the beginning I didn’t read further, so I didn’t recognize the subtle differences:

Basic vs Extended Regular Expressions In basic regular expressions the meta-characters ?, +,

 I always used egrep and needlessly parens, because I learned from examples. Now I learned something new. 🙂
 Источник