How can I search for a multiline pattern in a file?
LinuxCommand LineGrepFindPcregrepLinux Problem Overview
I needed to find all the files that contained a specific string pattern. The first solution that comes to mind is using find piped with xargs grep:
find . -iname '*.py' | xargs grep -e 'YOUR_PATTERN'
But if I need to find patterns that spans on more than one line, I'm stuck because vanilla grep can't find multiline patterns.
Linux Solutions
Solution 1 - Linux
Why don't you go for awk:
awk '/Start pattern/,/End pattern/' filename
Solution 2 - Linux
Here is the example using GNU grep
:
grep -Pzo '_name.*\n.*_description'
> -z
/--null-data
Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.
Which has the effect of treating the whole file as one large line. See description here
Solution 3 - Linux
So I discovered pcregrep which stands for Perl Compatible Regular Expressions GREP.
> the -M option makes it possible to search for patterns that span line boundaries.
For example, you need to find files where the '_name' variable is followed on the next line by the '_description' variable:
find . -iname '*.py' | xargs pcregrep -M '_name.*\n.*_description'
Tip: you need to include the line break character in your pattern. Depending on your platform, it could be '\n', \r', '\r\n', ...
Solution 4 - Linux
grep -P
also uses libpcre, but is much more widely installed. To find a complete title
section of an html document, even if it spans multiple lines, you can use this:
grep -P '(?s)<title>.*</title>' example.html
Since the PCRE project implements to the perl standard, use the perl documentation for reference:
Solution 5 - Linux
Here is a more useful example:
pcregrep -Mi "<title>(.*\n){0,5}</title>" afile.html
It searches the title tag in a html file even if it spans up to 5 lines.
Here is an example of unlimited lines:
pcregrep -Mi "(?s)<title>.*</title>" example.html
Solution 6 - Linux
With silver searcher:
ag 'abc.*(\n|.)*efg'
Speed optimizations of silver searcher could possibly shine here.
Solution 7 - Linux
@Marcin: awk example non-greedy:
awk '{if ($0 ~ /Start pattern/) {triggered=1;}if (triggered) {print; if ($0 ~ /End pattern/) { exit;}}}' filename
Solution 8 - Linux
You can use the grep alternative sift here (disclaimer: I am the author).
It support multiline matching and limiting the search to specific file types out of the box:
sift -m --files '*.py' 'YOUR_PATTERN'(search all *.py files for the specified multiline regex pattern)
It is available for all major operating systems. Take a look at the samples page to see how it can be used to to extract multiline values from an XML file.
Solution 9 - Linux
This answer might be useful:
https://stackoverflow.com/questions/3717772/regex-grep-for-multi-line-search-needed/7167115#7167115
To find recursively you can use flags -R (recursive) and --include (GLOB pattern). See:
Solution 10 - Linux
perl -ne 'print if (/begin pattern/../end pattern/)' filename
Solution 11 - Linux
Using ex
/vi
editor and globstar option (syntax similar to awk
and sed
):
ex +"/string1/,/string3/p" -R -scq! file.txt
where aaa
is your starting point, and bbb
is your ending text.
To search recursively, try:
ex +"/aaa/,/bbb/p" -scq! **/*.py
Note: To enable **
syntax, run shopt -s globstar
(Bash 4 or zsh).