How can I store regex captures in an array in Perl?

RegexPerlArrays

Regex Problem Overview


Is it possible to store all matches for a regular expression into an array?

I know I can use ($1,...,$n) = m/expr/g;, but it seems as though that can only be used if you know the number of matches you are looking for. I have tried my @array = m/expr/g;, but that doesn't seem to work.

Regex Solutions


Solution 1 - Regex

If you're doing a global match (/g) then the regex in list context will return all of the captured matches. Simply do:

my @matches = ( $str =~ /pa(tt)ern/g )

This command for example:

perl -le '@m = ( "foo12gfd2bgbg654" =~ /(\d+)/g ); print for @m'

Gives the output:

12
2
654

Solution 2 - Regex

See the manual entry for perldoc perlop under "Matching in List Context":

> If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1 , $2 , $3 ...)

> The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

You can simply grab all the matches by assigning to an array, or otherwise performing the evaluation in list context:

my @matches = ($string =~ m/word/g);

Solution 3 - Regex

Sometimes you need to get all matches globally, like PHP's preg_match_all does. If it's your case, then you can write something like:

# a dummy example
my $subject = 'Philip Fry Bender Rodriguez Turanga Leela';
my @matches;
push @matches, [$1, $2] while $subject =~ /(\w+) (\w+)/g;

use Data::Dumper;
print Dumper(\@matches);

It prints

$VAR1 = [          [            'Philip',            'Fry'          ],
          [            'Bender',            'Rodriguez'          ],
          [            'Turanga',            'Leela'          ]
        ];

Solution 4 - Regex

I think this is a self-explanatory example. Note /g modifier in the first regex:

$string = "one two three four";

@res = $string =~ m/(\w+)/g;
print Dumper(@res); # @res = ("one", "two", "three", "four")

@res = $string =~ m/(\w+) (\w+)/;
print Dumper(@res); # @res = ("one", "two")

Remember, you need to make sure the lvalue is in the list context, which means you have to surround scalar values with parenthesis:

($one, $two) = $string =~ m/(\w+) (\w+)/;

Solution 5 - Regex

> Is it possible to store all matches for a regular expression into an array?

Yes, in Perl 5.25.7, the variable @{^CAPTURE} was added, which holds "the contents of the capture buffers, if any, of the last successful pattern match". This means it contains ($1, $2, ...) even if the number of capture groups is unknown.

Before Perl 5.25.7 (since 5.6.0) you could build the same array using @- and @+ as suggested by @Jaques in his answer. You would have to do something like this:

    my @capture = ();
    for (my $i = 1; $i < @+; $i++) {
        push @capture, substr $subject, $-[$i], $+[$i] - $-[$i];
    }

Solution 6 - Regex

I am surprised this is not already mentioned here, but perl documentation provides with the standard variable @+. To quote from the documentation:

> This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope.

So, to get the value caught in first capture, one would write:

print substr( $str, $-[1], $+[1] - $-[1] ), "\n"; # equivalent to $1

As a side note, there is also the standard variable %- which is very nifty, because it not only contains named captures, but also allows for duplicate names to be stored in an array.

Using the example provided in the documentation:

/(?<A>1)(?<B>2)(?<A>3)(?<B>4)/

would yield an hash with entries such as:

$-{A}[0] : '1'
$-{A}[1] : '3'
$-{B}[0] : '2'
$-{B}[1] : '4'

Solution 7 - Regex

Note that if you know the number of capturing groups you need per match, you can use this simple approach, which I present as an example (of 2 capturing groups.)

Suppose you have some 'data' like

my $mess = <<'IS_YOURS';
Richard   	Rich
April        	May
Harmony		        Ha\rm
Winter           Win
Faith     Hope
William  		Will
Aurora     Dawn
Joy  
IS_YOURS

With the following regex

my $oven = qr'^(\w+)\h+(\w+)$'ma;  # skip the /a modifier if using perl < 5.14

I can capture all 12 (6 pairs, not 8...Harmony escaped and Joy is missing) in the @box below.

my @box = $mess =~ m[$oven]g;

If I want to "hash out" the details of the box I could just do:

my %hash = @box;

Or I just could have just skipped the box entirely,

my %hash = $mess =~ m[$oven]g;

Note that %hash contains the following. Order is lost and dupe keys (if any had existed) are squashed:

(
          'April'   => 'May',
          'Richard' => 'Rich',
          'Winter'  => 'Win',
          'William' => 'Will', 
          'Faith'   => 'Hope',
          'Aurora'  => 'Dawn'
);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioncskwrdView Question on Stackoverflow
Solution 1 - RegexfriedoView Answer on Stackoverflow
Solution 2 - RegexEtherView Answer on Stackoverflow
Solution 3 - RegexcodeholicView Answer on Stackoverflow
Solution 4 - RegexFlimmView Answer on Stackoverflow
Solution 5 - RegexViktor SöderqvistView Answer on Stackoverflow
Solution 6 - RegexJacquesView Answer on Stackoverflow
Solution 7 - RegexYenForYangView Answer on Stackoverflow