How to capture an arbitrary number of groups in JavaScript Regexp?

JavascriptRegexRepeatCapturing Group

Javascript Problem Overview


I would expect this line of JavaScript:

"foo bar baz".match(/^(\s*\w+)+$/)

to return something like:

["foo bar baz", "foo", " bar", " baz"]

but instead it returns only the last captured match:

["foo bar baz", " baz"]

Is there a way to get all the captured matches?

Javascript Solutions


Solution 1 - Javascript

When you repeat a capturing group, in most flavors, only the last capture is kept; any previous capture is overwritten. In some flavor, e.g. .NET, you can get all intermediate captures, but this is not the case with Javascript.

That is, in Javascript, if you have a pattern with N capturing groups, you can only capture exactly N strings per match, even if some of those groups were repeated.

So generally speaking, depending on what you need to do:

  • If it's an option, split on delimiters instead
  • Instead of matching /(pattern)+/, maybe match /pattern/g, perhaps in an exec loop
    • Do note that these two aren't exactly equivalent, but it may be an option
  • Do multilevel matching:
    • Capture the repeated group in one match
    • Then run another regex to break that match apart
References

Example

Here's an example of matching <some;words;here> in a text, using an exec loop, and then splitting on ; to get individual words (see also on ideone.com):

var text = "a;b;<c;d;e;f>;g;h;i;<no no no>;j;k;<xx;yy;zz>";
 
var r = /<(\w+(;\w+)*)>/g;
 
var match;
while ((match = r.exec(text)) != null) {
  print(match[1].split(";"));
}
// c,d,e,f
// xx,yy,zz

The pattern used is:

      _2__
     /    \
<(\w+(;\w+)*)>
 \__________/
      1

This matches <word>, <word;another>, <word;another;please>, etc. Group 2 is repeated to capture any number of words, but it can only keep the last capture. The entire list of words is captured by group 1; this string is then split on the semicolon delimiter.

Solution 2 - Javascript

How's about this? "foo bar baz".match(/(\w+)+/g)

Solution 3 - Javascript

Unless you have a more complicated requirement for how you're splitting your strings, you can split them, and then return the initial string with them:

var data = "foo bar baz";
var pieces = data.split(' ');
pieces.unshift(data);

Solution 4 - Javascript

try using 'g':

"foo bar baz".match(/\w+/g)

Solution 5 - Javascript

You can use LAZY evaluation. So, instead of using * (GREEDY), try using ? (LAZY)

REGEX: (\s*\w+)?

RESULT:

Match 1: foo

Match 2: bar

Match 3: baz

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questiondisc0dancerView Question on Stackoverflow
Solution 1 - JavascriptpolygenelubricantsView Answer on Stackoverflow
Solution 2 - Javascriptmeder omuralievView Answer on Stackoverflow
Solution 3 - Javascriptg.d.d.cView Answer on Stackoverflow
Solution 4 - JavascriptJetView Answer on Stackoverflow
Solution 5 - JavascriptAbramYuView Answer on Stackoverflow