Regex lookahead for 'not followed by' in grep

RegexGrepRegex Lookarounds

Regex Problem Overview


I am attempting to grep for all instances of Ui\. not followed by Line or even just the letter L

What is the proper way to write a regex for finding all instances of a particular string NOT followed by another string?

Using lookaheads

grep "Ui\.(?!L)" *
bash: !L: event not found


grep "Ui\.(?!(Line))" *
nothing

Regex Solutions


Solution 1 - Regex

Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep. You need a PCRE-enabled grep.

If you have GNU grep, the current version supports options -P or --perl-regexp and you can then use the regex you wanted.

If you don't have (a sufficiently recent version of) GNU grep, then consider getting ack.

Solution 2 - Regex

The answer to part of your problem is here, and ack would behave the same way: https://stackoverflow.com/questions/8385020/ack-negative-lookahead-giving-errors

You are using double-quotes for grep, which permits bash to "interpret ! as history expand command."

You need to wrap your pattern in SINGLE-QUOTES: grep 'Ui\.(?!L)' *

However, see @JonathanLeffler's answer to address the issues with negative lookaheads in standard grep!

Solution 3 - Regex

You probably cant perform standard negative lookaheads using grep, but usually you should be able to get equivalent behaviour using the "inverse" switch '-v'. Using that you can construct a regex for the complement of what you want to match and then pipe it through 2 greps.

For the regex in question you might do something like

grep 'Ui\.' * | grep -v 'Ui\.L'

Solution 4 - Regex

If you need to use a regex implementation that doesn't support negative lookaheads and you don't mind matching extra character(s)*, then you can use negated character classes [^L], alternation |, and the end of string anchor $.

In your case grep 'Ui\.\([^L]\|$\)' * does the job.

  • Ui\. matches the string you're interested in

  • \([^L]\|$\) matches any single character other than L or it matches the end of the line: [^L] or $.

If you want to exclude more than just one character, then you just need to throw more alternation and negation at it. To find a not followed by bc:

grep 'a\(\([^b]\|$\)\|\(b\([^c]\|$\)\)\)' *

Which is either (a followed by not b or followed by the end of the line: a then [^b] or $) or (a followed by b which is either followed by not c or is followed by the end of the line: a then b, then [^c] or $.

This kind of expression gets to be pretty unwieldy and error prone with even a short string. You could write something to generate the expressions for you, but it'd probably be easier to just use a regex implementation that supports negative lookaheads.

*If your implementation supports non-capturing groups then you can avoid capturing extra characters.

Solution 5 - Regex

At least for the case of not wanting an 'L' character after the "Ui." you don't really need PCRE.

    grep -E 'Ui\.($|[^L])' *

Here I've made sure to match the special case of the "Ui." at the end of the line.

Solution 6 - Regex

If your grep doesn't support -P or --perl-regexp, and you can install PCRE-enabled grep, e.g. "pcregrep", than it won't need any command-line options like GNU grep to accept Perl-compatible regular expressions, you just run

pcregrep "Ui\.(?!Line)"

You don't need another nested group for "Line" as in your example "Ui.(?!(Line))" -- the outer group is sufficient, like I've shown above.

Let me give you another example of looking negative assertions: when you have list of lines, returned by "ipset", each line showing number of packets in a middle of the line, and you don't need lines with zero packets, you just run:

ipset list | pcregrep "packets(?! 0 )"

If you like perl-compatible regular expressions and have perl but don't have pcregrep or your grep doesn't support --perl-regexp, you can you one-line perl scripts that work the same way like grep:

perl -e "while (<>) {if (/Ui\.(?!Lines)/){print;};}"

Perl accepts stdin the same way like grep, e.g.

ipset list | perl -e "while (<>) {if (/packets(?! 0 )/){print;};}"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLee QuarellaView Question on Stackoverflow
Solution 1 - RegexJonathan LefflerView Answer on Stackoverflow
Solution 2 - RegexNHDalyView Answer on Stackoverflow
Solution 3 - RegexKarel TucekView Answer on Stackoverflow
Solution 4 - RegexdougcosineView Answer on Stackoverflow
Solution 5 - RegexDoug RobinsonView Answer on Stackoverflow
Solution 6 - RegexMaxim MasiutinView Answer on Stackoverflow