Python non-greedy regexes
PythonRegexRegex GreedyPython Problem Overview
How do I make a python regex like "(.*)"
such that, given "a (b) c (d) e"
python matches "b"
instead of "b) c (d"
?
I know that I can use "[^)]"
instead of "."
, but I'm looking for a more general solution that keeps my regex a little cleaner. Is there any way to tell python "hey, match this as soon as possible"?
Python Solutions
Solution 1 - Python
You seek the all-powerful *?
From the docs, Greedy versus Non-Greedy
> the non-greedy qualifiers *?
, +?
, ??
, or {m,n}?
[...] match as little
> text as possible.
Solution 2 - Python
>>> x = "a (b) c (d) e"
>>> re.search(r"\(.*\)", x).group()
'(b) c (d)'
>>> re.search(r"\(.*?\)", x).group()
'(b)'
>The '*
', '+
', and '?
' qualifiers are all greedy; they match as much text as possible. Sometimes this behavior isn’t desired; if the RE <.*>
is matched against '<H1>title</H1>
', it will match the entire string, and not just '<H1>
'. Adding '?
' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*?
in the previous expression will match only '<H1>
'.
Solution 3 - Python
Would not \\(.*?\\)
work? That is the non-greedy syntax.
Solution 4 - Python
Do you want it to match "(b)"? Do as Zitrax and Paolo have suggested. Do you want it to match "b"? Do
>>> x = "a (b) c (d) e"
>>> re.search(r"\((.*?)\)", x).group(1)
'b'
Solution 5 - Python
Using an ungreedy match is a good start, but I'd also suggest that you reconsider any use of .*
-- what about this?
groups = re.search(r"\([^)]*\)", x)
Solution 6 - Python
As the others have said using the ? modifier on the * quantifier will solve your immediate problem, but be careful, you are starting to stray into areas where regexes stop working and you need a parser instead. For instance, the string "(foo (bar)) baz" will cause you problems.
Solution 7 - Python
To start with, I do not suggest using "*" in regexes. Yes, I know, it is the most used multi-character delimiter, but it is nevertheless a bad idea. This is because, while it does match any amount of repetition for that character, "any" includes 0, which is usually something you want to throw a syntax error for, not accept. Instead, I suggest using the +
sign, which matches any repetition of length > 1. What's more, from what I can see, you are dealing with fixed-length parenthesized expressions. As a result, you can probably use the {x, y}
syntax to specifically specify the desired length.
However, if you really do need non-greedy repetition, I suggest consulting the all-powerful ?
. This, when placed after at the end of any regex repetition specifier, will force that part of the regex to find the least amount of text possible.
That being said, I would be very careful with the ?
as it, like the Sonic Screwdriver in Dr. Who, has a tendency to do, how should I put it, "slightly" undesired things if not carefully calibrated. For example, to use your example input, it would identify ((1)
(note the lack of a second rparen) as a match.