How to do a non-greedy match in grep?

RegexShellCommand LineGrepRegex Greedy

Regex Problem Overview


I want to grep the shortest match and the pattern should be something like:

<car ... model=BMW ...>
...
...
...
</car>

... means any character and the input is multiple lines.

Regex Solutions


Solution 1 - Regex

You're looking for a non-greedy (or lazy) match. To get a non-greedy match in regular expressions you need to use the modifier ? after the quantifier. For example you can change .* to .*?.

By default grep doesn't support non-greedy modifiers, but you can use grep -P to use the Perl syntax.

Solution 2 - Regex

Actualy the .*? only works in perl. I am not sure what the equivalent grep extended regexp syntax would be. Fortunately you can use perl syntax with grep so grep -P would work but grep -E which is same as egrep would not work (it would be greedy).

See also: http://blog.vinceliu.com/2008/02/non-greedy-regular-expression-matching.html

Solution 3 - Regex

My grep that works after trying out stuff in this thread:

echo "hi how are you " | grep -shoP ".*? "

Just make sure you append a space to each one of your lines

(Mine was a line by line search to spit out words)

Solution 4 - Regex

grep

For non-greedy match in grep you could use a negated character class. In other words, try to avoid wildcards.

For example, to fetch all links to jpeg files from the page content, you'd use:

grep -o '"[^" ]\+.jpg"'

To deal with multiple line, pipe the input through xargs first. For performance, use ripgrep.

Solution 5 - Regex

Sorry I am 9 years late, but this might work for the viewers in 2020.

So suppose you have a line like "Hello my name is Jello". Now you want to find the words that start with 'H' and end with 'o', with any number of characters in between. And we don't want lines we just want words. So for that we can use the expression:

grep "H[^ ]*o" file

This will return all the words. The way this works is that: It will allow all the characters instead of space character in between, this way we can avoid multiple words in the same line.

Now you can replace the space character with any other character you want. Suppose the initial line was "Hello-my-name-is-Jello", then you can get words using the expression:

grep "H[^-]*o" file

Solution 6 - Regex

The short answer is using the next regular expression:

(?s)<car .*? model=BMW .*?>.*?</car>
  • (?s) - this makes a match across multiline
  • .*? - matches any character, a number of times in a lazy way (minimal match)

A (little) more complicated answer is:

(?s)<([a-z\-_0-9]+?) .*? model=BMW .*?>.*?</\1>

This will makes possible to match car1 and car2 in the following text

<car1 ... model=BMW ...>
...
...
...
</car1>
<car2 ... model=BMW ...>
...
...
...
</car2>
  • (..) represents a capturing group
  • \1 in this context matches the sametext as most recently matched by capturing group number 1

Solution 7 - Regex

I know that its a bit of a dead post but I just noticed that this works. It removed both clean-up and cleanup from my output.

> grep -v -e 'clean\-\?up'
> grep --version grep (GNU grep) 2.20

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsykerView Question on Stackoverflow
Solution 1 - RegexMark ByersView Answer on Stackoverflow
Solution 2 - RegexJohn SmithView Answer on Stackoverflow
Solution 3 - RegexjonzView Answer on Stackoverflow
Solution 4 - RegexkenorbView Answer on Stackoverflow
Solution 5 - Regexmr.1n5an_eView Answer on Stackoverflow
Solution 6 - RegexjmcView Answer on Stackoverflow
Solution 7 - Regexuser200850View Answer on Stackoverflow