How can I search for a multiline pattern in a file?

LinuxCommand LineGrepFindPcregrep

Linux Problem Overview


I needed to find all the files that contained a specific string pattern. The first solution that comes to mind is using find piped with xargs grep:

find . -iname '*.py' | xargs grep -e 'YOUR_PATTERN'

But if I need to find patterns that spans on more than one line, I'm stuck because vanilla grep can't find multiline patterns.

Linux Solutions


Solution 1 - Linux

Why don't you go for awk:

awk '/Start pattern/,/End pattern/' filename

Solution 2 - Linux

Here is the example using GNU grep:

grep -Pzo '_name.*\n.*_description'

> -z/--null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.

Which has the effect of treating the whole file as one large line. See description here

Solution 3 - Linux

So I discovered pcregrep which stands for Perl Compatible Regular Expressions GREP.

> the -M option makes it possible to search for patterns that span line boundaries.

For example, you need to find files where the '_name' variable is followed on the next line by the '_description' variable:

find . -iname '*.py' | xargs pcregrep -M '_name.*\n.*_description'

Tip: you need to include the line break character in your pattern. Depending on your platform, it could be '\n', \r', '\r\n', ...

Solution 4 - Linux

grep -P also uses libpcre, but is much more widely installed. To find a complete title section of an html document, even if it spans multiple lines, you can use this:

grep -P '(?s)<title>.*</title>' example.html

Since the PCRE project implements to the perl standard, use the perl documentation for reference:

Solution 5 - Linux

Here is a more useful example:

pcregrep -Mi "<title>(.*\n){0,5}</title>" afile.html

It searches the title tag in a html file even if it spans up to 5 lines.

Here is an example of unlimited lines:

pcregrep -Mi "(?s)<title>.*</title>" example.html 

Solution 6 - Linux

With silver searcher:

ag 'abc.*(\n|.)*efg'

Speed optimizations of silver searcher could possibly shine here.

Solution 7 - Linux

@Marcin: awk example non-greedy:

awk '{if ($0 ~ /Start pattern/) {triggered=1;}if (triggered) {print; if ($0 ~ /End pattern/) { exit;}}}' filename

Solution 8 - Linux

You can use the grep alternative sift here (disclaimer: I am the author).

It support multiline matching and limiting the search to specific file types out of the box:

sift -m --files '*.py' 'YOUR_PATTERN'
(search all *.py files for the specified multiline regex pattern)

It is available for all major operating systems. Take a look at the samples page to see how it can be used to to extract multiline values from an XML file.

Solution 9 - Linux

Solution 10 - Linux

perl -ne 'print if (/begin pattern/../end pattern/)' filename

Solution 11 - Linux

Using ex/vi editor and globstar option (syntax similar to awk and sed):

ex +"/string1/,/string3/p" -R -scq! file.txt

where aaa is your starting point, and bbb is your ending text.

To search recursively, try:

ex +"/aaa/,/bbb/p" -scq! **/*.py

Note: To enable ** syntax, run shopt -s globstar (Bash 4 or zsh).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionOliView Question on Stackoverflow
Solution 1 - LinuxAmitView Answer on Stackoverflow
Solution 2 - LinuxayazView Answer on Stackoverflow
Solution 3 - LinuxOliView Answer on Stackoverflow
Solution 4 - LinuxbukzorView Answer on Stackoverflow
Solution 5 - LinuxOliView Answer on Stackoverflow
Solution 6 - LinuxShwaydoggView Answer on Stackoverflow
Solution 7 - LinuxMartinView Answer on Stackoverflow
Solution 8 - LinuxsventView Answer on Stackoverflow
Solution 9 - LinuxalbfanView Answer on Stackoverflow
Solution 10 - LinuxpbalView Answer on Stackoverflow
Solution 11 - LinuxkenorbView Answer on Stackoverflow