Is there a Perl shortcut to count the number of matches in a string?

ArraysRegexPerlPerl4

Arrays Problem Overview


Suppose I have:

my $string = "one.two.three.four";

How should I play with context to get the number of times the pattern found a match (3)? Can this be done using a one-liner?

I tried this:

my ($number) = scalar($string=~/\./gi);

I thought that by putting parentheses around $number, I'd force array context, and by the use of scalar, I'd get the count. However, all I get is 1.

Arrays Solutions


Solution 1 - Arrays

That puts the regex itself in scalar context, which isn't what you want. Instead, put the regex in list context (to get the number of matches) and put that into scalar context.

 my $number = () = $string =~ /\./gi;

Solution 2 - Arrays

I think the clearest way to describe this would be to avoid the instant-cast to scalar. First assign to an array, and then use that array in scalar context. That's basically what the = () = idiom will do, but without the (rarely used) idiom:

my $string = "one.two.three.four";
my @count = $string =~ /\./g;
print scalar @count;

Solution 3 - Arrays

Also, see Perlfaq4 :

>There are a number of ways, with varying efficiency. If you want a count of a certain single character (X) within a string, you can use the tr/// function like so: > $string = "ThisXlineXhasXsomeXx'sXinXit"; $count = ($string =~ tr/X//); print "There are $count X characters in the string";

>This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a larger string, tr/// won't work. What you can do is wrap a while() loop around a global pattern match. For example, let's count negative integers: > $string = "-9 55 48 -2 23 -76 4 14 -44"; while ($string =~ /-\d+/g) { $count++ } print "There are $count negative numbers in the string";

>Another version uses a global match in list context, then assigns the result to a scalar, producing a count of the number of matches. > $count = () = $string =~ /-\d+/g;

Solution 4 - Arrays

Is the following code a one-liner?

print $string =~ s/\./\./g;

Solution 5 - Arrays

Try this:

my $string = "one.two.three.four";
my ($number) = scalar( @{[ $string=~/\./gi ]} );

It returns 3 for me. By creating a reference to an array the regular expression is evaluated in list context and the @{..} de-references the array reference.

Solution 6 - Arrays

I noticed that if you have an OR condition in your regular expression (eg /(K..K)|(V.AK)/gi ) then the array produced may have undefined elements which are included in the count at the end.

For example:

my $seq = "TSYCSKSNKRCRRKYGDDDDWWRSQYTTYCSCYTGKSGKTKGGDSCDAYYEAYGKSGKTKGGRNNR";
my $regex = '(K..K)|(V.AK)';
my $count = () = $seq =~ /$regex/gi;
print "$count\n";

Gives a value of count of 6.

I found the solution in this post https://stackoverflow.com/questions/11122977/how-do-i-remove-all-undefs-from-array

my $seq = "TSYCSKSNKRCRRKYGDDDDWWRSQYTTYCSCYTGKSGKTKGGDSCDAYYEAYGKSGKTKGGRNNR";
my $regex = '(K..K)|(V.AK)';
my @count = $seq =~ /$regex/gi;
@count = grep defined, @count; 
my $count = scalar @count;
print "$count\n";

Which then gives the correct answer of three.

Solution 7 - Arrays

another way,

my $string = "one.two.three.four";
@s = split /\./,$string;
print scalar @s - 1;

Solution 8 - Arrays

Friedo's method is: $a = () = $b =~ $c.

But it's possible to simplify this even further to just ($a) = $b =~ $c, like so :

my ($matchcount) = $text =~ s/$findregex/ /gi;

You could thank just wrap this up in a function, getMatchCount(), and not worry about it destroying the passed string.

On the other hand, you can add in a swap, which may be a bit more computation, but does not result in altering the string.

my ($matchcount) = $text =~ s/($findregex)/$1/gi;

Solution 9 - Arrays

my $count = 0;
my $pos = -1;
while (($pos = index($string, $match, $pos+1)) > -1) {
  $count++;
}

checked with Benchmark, it's pretty fast

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGeoView Question on Stackoverflow
Solution 1 - ArraysfriedoView Answer on Stackoverflow
Solution 2 - ArraysRobert PView Answer on Stackoverflow
Solution 3 - ArraysRobert PView Answer on Stackoverflow
Solution 4 - ArraysMike View Answer on Stackoverflow
Solution 5 - ArraysPP.View Answer on Stackoverflow
Solution 6 - ArraysAlastair SkeffingtonView Answer on Stackoverflow
Solution 7 - Arraysghostdog74View Answer on Stackoverflow
Solution 8 - ArraysHoldOffHungerView Answer on Stackoverflow
Solution 9 - ArraysTim CadellView Answer on Stackoverflow