Regex exactly n OR m times
JavaPhpRegexJava Problem Overview
Consider the following regular expression, where X
is any regex.
X{n}|X{m}
This regex would test for X
occurring exactly n
or m
times.
Is there a regex quantifier that can test for an occurrence X
exactly n
or m
times?
Java Solutions
Solution 1 - Java
There is no single quantifier that means "exactly m or n times". The way you are doing it is fine.
An alternative is:
X{m}(X{k})?
where m < n
and k
is the value of n-m
.
Solution 2 - Java
Here is the complete list of quantifiers (ref. http://www.regular-expressions.info/reference.html):
?
,??
- 0 or 1 occurences (??
is lazy,?
is greedy)*
,*?
- any number of occurences+
,+?
- at least one occurence{n}
- exactlyn
occurences{n,m}
-n
tom
occurences, inclusive{n,m}?
-n
tom
occurences, lazy{n,}
,{n,}?
- at leastn
occurence
To get "exactly N or M", you need to write the quantified regex twice, unless m,n are special:
X{n,m}
ifm = n+1
(?:X{n}){1,2}
ifm = 2n
- ...
Solution 3 - Java
No, there is no such quantifier. But I'd restructure it to /X{m}(X{m-n})?/
to prevent problems in backtracking.
Solution 4 - Java
TLDR; (?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)
Looks like you want "x n times" or "x m times", I think a literal translation to regex would be (x{n}|x{m}).
Like this https://regex101.com/r/vH7yL5/1
or, in a case where you can have a sequence of more than m "x"s (assuming m > n), you can add 'following no "x"' and 'followed by no "x", translating to [^x](x{n}|x{m})[^x]
but that would assume that there are always a character behind and after you "x"s. As you can see here: https://regex101.com/r/bB2vH2/1
you can change it to (?:[^x]|^)(x{n}|x{m})(?:[^x]|$)
, translating to "following no 'x' or following line start" and "followed by no 'x' or followed by line end". But still, it won't match two sequences with only one character between them (because the first match would require a character after, and the second a character before) as you can see here: https://regex101.com/r/oC5oJ4/1
Finally, to match the one character distant match, you can add a positive look ahead (?=) on the "no 'x' after" or a positive look behind (?<=) on the "no 'x' before", like this: https://regex101.com/r/mC4uX3/1
(?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)
This way you will match only the exact number of 'x's you want.
Solution 5 - Java
Very old post, but I'd like to contribute sth that might be of help. I've tried it exactly the way stated in the question and it does work but there's a catch: The order of the quantities matters. Consider this:
#[a-f0-9]{6}|#[a-f0-9]{3}
This will find all occurences of hex colour codes (they're either 3 or 6 digits long). But when I flip it around like this
#[a-f0-9]{3}|#[a-f0-9]{6}
it will only find the 3 digit ones or the first 3 digits of the 6 digit ones. This does make sense and a Regex pro might spot this right away, but for many this might be a peculiar behaviour. There are some advanced Regex features that might avoid this trap regardless of the order, but not everyone is knee-deep into Regex patterns.
Solution 6 - Java
Taking a look at Enhardened's answer, they state that their penultimate expression won't match sequences with only one character between them. There is an easy way to fix this without using look ahead/look behind, and that's to replace the start/end character with the boundary character. This lets you match against word boundaries which includes start/end. As such, the appropriate expression should be:
(?:[^x]|\b)(x{n}|x{m})(?:[^x]|\b)
As you can see here: https://regex101.com/r/oC5oJ4/2.