How can I count the number of matches for a regex?

JavaRegex

Java Problem Overview


Let's say I have a string which contains this:

HelloxxxHelloxxxHello

I compile a pattern to look for 'Hello'

Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher("HelloxxxHelloxxxHello");

It should find three matches. How can I get a count of how many matches there were?

I've tried various loops and using the matcher.groupCount() but it didn't work.

Java Solutions


Solution 1 - Java

matcher.find() does not find all matches, only the next match.

Solution for Java 9+

long matches = matcher.results().count();

Solution for Java 8 and older

You'll have to do the following. (Starting from Java 9, there is a nicer solution)

int count = 0;
while (matcher.find())
    count++;

Btw, matcher.groupCount() is something completely different.

Complete example:

import java.util.regex.*;

class Test {
    public static void main(String[] args) {
        String hello = "HelloxxxHelloxxxHello";
        Pattern pattern = Pattern.compile("Hello");
        Matcher matcher = pattern.matcher(hello);
        
        int count = 0;
        while (matcher.find())
            count++;
        
        System.out.println(count);    // prints 3
    }
}

Handling overlapping matches

When counting matches of aa in aaaa the above snippet will give you 2.

aaaa
aa
  aa

To get 3 matches, i.e. this behavior:

aaaa
aa
 aa
  aa

You have to search for a match at index <start of last match> + 1 as follows:

String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);

int count = 0;
int i = 0;
while (matcher.find(i)) {
    count++;
    i = matcher.start() + 1;
}

System.out.println(count);    // prints 3

Solution 2 - Java

This should work for matches that might overlap:

public static void main(String[] args) {
	String input = "aaaaaaaa";
	String regex = "aa";
	Pattern pattern = Pattern.compile(regex);
	Matcher matcher = pattern.matcher(input);
	int from = 0;
	int count = 0;
	while(matcher.find(from)) {
		count++;
		from = matcher.start() + 1;
	}
	System.out.println(count);
}

Solution 3 - Java

From Java 9, you can use the stream provided by Matcher.results()

long matches = matcher.results().count();

Solution 4 - Java

If you want to use Java 8 streams and are allergic to while loops, you could try this:

public static int countPattern(String references, Pattern referencePattern) {
    Matcher matcher = referencePattern.matcher(references);
    return Stream.iterate(0, i -> i + 1)
            .filter(i -> !matcher.find())
            .findFirst()
            .get();
}

Disclaimer: this only works for disjoint matches.

Example:

public static void main(String[] args) throws ParseException {
    Pattern referencePattern = Pattern.compile("PASSENGER:\\d+");
    System.out.println(countPattern("[ \"PASSENGER:1\", \"PASSENGER:2\", \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
    System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\" ]", referencePattern));
    System.out.println(countPattern("[ \"AIR:1\", \"AIR:2\", \"FOP:2\", \"PASSENGER:1\" ]", referencePattern));
    System.out.println(countPattern("[  ]", referencePattern));
}

This prints out:

2
0
1
0

This is a solution for disjoint matches with streams:

public static int countPattern(String references, Pattern referencePattern) {
    return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
            new Iterator<Integer>() {
                Matcher matcher = referencePattern.matcher(references);
                int from = 0;

                @Override
                public boolean hasNext() {
                    return matcher.find(from);
                }

                @Override
                public Integer next() {
                    from = matcher.start() + 1;
                    return 1;
                }
            },
            Spliterator.IMMUTABLE), false).reduce(0, (a, c) -> a + c);
}

Solution 5 - Java

Use the below code to find the count of number of matches that the regex finds in your input

        Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);// "regex" here indicates your predefined regex.
		Matcher m = p.matcher(pattern); // "pattern" indicates your string to match the pattern against with
		boolean b = m.matches();
		if(b)
		count++;
		while (m.find())
		count++;

This is a generalized code not specific one though, tailor it to suit your need

Please feel free to correct me if there is any mistake.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTonyView Question on Stackoverflow
Solution 1 - JavaaioobeView Answer on Stackoverflow
Solution 2 - JavaMary-Anne WolfView Answer on Stackoverflow
Solution 3 - Javavương trọng hồView Answer on Stackoverflow
Solution 4 - Javagil.fernandesView Answer on Stackoverflow
Solution 5 - Javasayed amirView Answer on Stackoverflow