Regex exactly n OR m times

Java Problem Overview

Consider the following regular expression, where X is any regex.

X{n}|X{m}

This regex would test for X occurring exactly n or m times.

Is there a regex quantifier that can test for an occurrence X exactly n or m times?

Java Solutions

Solution 1 - Java

There is no single quantifier that means "exactly m or n times". The way you are doing it is fine.

An alternative is:

X{m}(X{k})?

where m < n and k is the value of n-m.

Solution 2 - Java

Here is the complete list of quantifiers (ref. http://www.regular-expressions.info/reference.html):

?, ?? - 0 or 1 occurences (?? is lazy, ? is greedy)
*, *? - any number of occurences
+, +? - at least one occurence
{n} - exactly n occurences
{n,m} - n to m occurences, inclusive
{n,m}? - n to m occurences, lazy
{n,}, {n,}? - at least n occurence

To get "exactly N or M", you need to write the quantified regex twice, unless m,n are special:

X{n,m} if m = n+1
(?:X{n}){1,2} if m = 2n
...

Solution 3 - Java

No, there is no such quantifier. But I'd restructure it to /X{m}(X{m-n})?/ to prevent problems in backtracking.

Solution 4 - Java

TLDR; (?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)

Looks like you want "x n times" or "x m times", I think a literal translation to regex would be (x{n}|x{m}). Like this https://regex101.com/r/vH7yL5/1

or, in a case where you can have a sequence of more than m "x"s (assuming m > n), you can add 'following no "x"' and 'followed by no "x", translating to [^x](x{n}|x{m})[^x] but that would assume that there are always a character behind and after you "x"s. As you can see here: https://regex101.com/r/bB2vH2/1

you can change it to (?:[^x]|^)(x{n}|x{m})(?:[^x]|$), translating to "following no 'x' or following line start" and "followed by no 'x' or followed by line end". But still, it won't match two sequences with only one character between them (because the first match would require a character after, and the second a character before) as you can see here: https://regex101.com/r/oC5oJ4/1

Finally, to match the one character distant match, you can add a positive look ahead (?=) on the "no 'x' after" or a positive look behind (?<=) on the "no 'x' before", like this: https://regex101.com/r/mC4uX3/1

(?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)

This way you will match only the exact number of 'x's you want.

Solution 5 - Java

Very old post, but I'd like to contribute sth that might be of help. I've tried it exactly the way stated in the question and it does work but there's a catch: The order of the quantities matters. Consider this:

#[a-f0-9]{6}|#[a-f0-9]{3}

This will find all occurences of hex colour codes (they're either 3 or 6 digits long). But when I flip it around like this

#[a-f0-9]{3}|#[a-f0-9]{6}

it will only find the 3 digit ones or the first 3 digits of the 6 digit ones. This does make sense and a Regex pro might spot this right away, but for many this might be a peculiar behaviour. There are some advanced Regex features that might avoid this trap regardless of the order, but not everyone is knee-deep into Regex patterns.

Solution 6 - Java

Taking a look at Enhardened's answer, they state that their penultimate expression won't match sequences with only one character between them. There is an easy way to fix this without using look ahead/look behind, and that's to replace the start/end character with the boundary character. This lets you match against word boundaries which includes start/end. As such, the appropriate expression should be:

(?:[^x]|\b)(x{n}|x{m})(?:[^x]|\b)

As you can see here: https://regex101.com/r/oC5oJ4/2.

Content Type	Original Author	Original Content on Stackoverflow
Question	FThompson	View Question on Stackoverflow
Solution 1 - Java	Mark Byers	View Answer on Stackoverflow
Solution 2 - Java	John Dvorak	View Answer on Stackoverflow
Solution 3 - Java	Bergi	View Answer on Stackoverflow
Solution 4 - Java	Enhardened	View Answer on Stackoverflow
Solution 5 - Java	DanDan	View Answer on Stackoverflow
Solution 6 - Java	rozza2058	View Answer on Stackoverflow