What regex can match sequences of the same character?

RegexPerl

Regex Problem Overview


A friend asked me this and I was stumped: Is there a way to craft a regular expression that matches a sequence of the same character? E.g., match on 'aaa', 'bbb', but not 'abc'?

m|\w{2,3}| 

Wouldn't do the trick as it would match 'abc'.

m|a{2,3}| 

Wouldn't do the trick as it wouldn't match 'bbb', 'ccc', etc.

Regex Solutions


Solution 1 - Regex

Sure thing! Grouping and references are your friends:

(.)\1+

Will match 2 or more occurences of the same character. For word constituent characters only, use \w instead of ., i.e.:

(\w)\1+

Solution 2 - Regex

Note that in Perl 5.10 we have alternative notations for backreferences as well.

foreach (qw(aaa bbb abc)) {
  say;
  say ' original' if /(\w)\1+/;
  say ' new way'  if /(\w)\g{1}+/;
  say ' relative' if /(\w)\g{-1}+/;
  say ' named'    if /(?'char'\w)\g{char}+/;
  say ' named'    if /(?<char>\w)\k<char>+/;
}

Solution 3 - Regex

This will match more than \w would, like @@@:

/(.)\1+/

Solution 4 - Regex

Answering my own question, but got it:

m|(\w)\1+|

Solution 5 - Regex

This is what backreferences are for.

m/(\w)\1\1/

will do the trick.

Solution 6 - Regex

This is also possible using pure regular expressions (i.e. those that describe regular languages -- not Perl regexps). Unfortunately, it means a regexp whose length is proportional to the size of the alphabet, e.g.:

(a* + b* + ... + z*)

Where a...z are the symbols in the finite alphabet.

So Perl regexps, although a superset of pure regular expressions, definitely have their advantages even when you just want to use them for pure regular expressions!

Solution 7 - Regex

For same 3 characters:

  • /(.)/1/1/
  • /(.)/1{2}/

For 2 characters:

  • /(.)/1/

For unknown number of same characters:

  • /(.)/1*/

PS: I use javascript

Solution 8 - Regex

".*(.)\\1{2,}.*"

Works for any two or more repeated symbols in the string

Solution 9 - Regex

If you are using Java, and find duplicate chars in given string here is the code,

public class Test {
public static void main(String args[]) {
    String s = "abbc";
    if (s.matches(".*([a-zA-Z])\\1+.*")) {
        System.out.println("Duplicate found!");
    } else {
        System.out.println("Duplicate not found!");
    }
}

}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBillView Question on Stackoverflow
Solution 1 - RegexDavid HanakView Answer on Stackoverflow
Solution 2 - RegexoylenshpeegulView Answer on Stackoverflow
Solution 3 - RegexgpojdView Answer on Stackoverflow
Solution 4 - RegexBillView Answer on Stackoverflow
Solution 5 - RegexfriedoView Answer on Stackoverflow
Solution 6 - RegexEdmundView Answer on Stackoverflow
Solution 7 - Regexdivyam ojasView Answer on Stackoverflow
Solution 8 - RegexRoman ShulhaView Answer on Stackoverflow
Solution 9 - RegexNishadView Answer on Stackoverflow