Skip to content
 

Look for multiple patterns in files

This is a really trite one, but seems to be an evergreen. How do I look for multiple patterns? This usually means one of two things: select the lines that contain /PAT1/ or /PAT2/ or /PAT3/, or select the lines that contain /PAT1/ and /PAT2/ and /PAT3/ (in that order or not).
These two tasks can be solved in many ways, let's see the most common ones.

/PAT1/ or /PAT2/ or /PAT3/

This matches lines that contain any one of a number of patterns (three in this examples). It can be done with many tools.

grep
grep -E 'PAT1|PAT2|PAT3' file.txt
grep -e 'PAT1' -e 'PAT2' -e 'PAT3' file.txt

When using -E, the patterns have to be in ERE syntax, while with -e the BRE syntax has to be used.

awk
awk '/PAT1|PAT2|PAT3/' file.txt
awk '/PAT1/ || /PAT2/ || /PAT3/' file.txt

Awk uses ERE syntax for its patterns.

sed

One way with sed:

sed '/PAT1/b; /PAT2/b; /PAT3/b; d' file.txt

Sed normally uses BRE syntax.

perl
perl -ne 'print if /PAT1/ or /PAT2/ or /PAT3/' file.txt

Perl patterns uses PCRE syntax (basically, a much richer superset of ERE).

/PAT1/ and /PAT2/ and /PAT3/

This selects lines that match all the patterns. Order may or may not matter. Again, it can be done with many tools.

grep

If order matters, one can simply do:

grep 'PAT1.*PAT2.*PAT3' file.txt

(or whatever order one wants).
If order does not matter, then with grep all the possible combinations have to be considered:

grep -e 'PAT1.*PAT2.*PAT3' -e 'PAT1.*PAT3.*PAT2' -e 'PAT2.*PAT1.*PAT3' -e 'PAT2.*PAT3.*PAT1' -e 'PAT3.*PAT1.*PAT2' -e 'PAT3.*PAT2.*PAT1' file.txt

Obviously, this method is not very efficient and does not scale well. This can be improved a little at the price of using multiple processes:

grep 'PAT1' file.txt | grep 'PAT2' | grep 'PAT3'
awk

Awk scales better:

awk '/PAT1/ && /PAT2/ && /PAT3/' file.txt   # order does not matter

This finds also lines where matches overlap.

sed
sed '/PAT1/!d; /PAT2/!d; /PAT3/!d' file.txt

This looks a bit weird, but it's obvious: if the line doesn't match any of the patterns, we delete it. At the end, the lines that "survive" and are thus printed must be only those that match all the patterns.

perl

This is basically the same as the awk solution:

perl -ne 'print if /PAT1/ and /PAT2/ and /PAT3/' file.txt
Be Sociable, Share!

One Comment

  1. lotto says:

    very nice, thank you very much

Leave a Reply

(required)