JavaScript regular expressions and sub-matches

JavascriptRegex

Javascript Problem Overview


Why do Javascript sub-matches stop working when the g modifier is set?

var text = 'test test test test';

var result = text.match(/t(e)(s)t/);
// Result: ["test", "e", "s"]

The above works fine, result[1] is "e" and result[2] is "s".

var result = text.match(/t(e)(s)t/g);
// Result: ["test", "test", "test", "test"]

The above ignores my capturing groups. Is the following the only valid solution?

var result = text.match(/test/g);
for (var i in result) {
    console.log(result[i].match(/t(e)(s)t/));
}
/* Result:
["test", "e", "s"]
["test", "e", "s"]
["test", "e", "s"]
["test", "e", "s"]
*/

EDIT:

I am back again to happily tell you that 10 years later you can now do this (.matchAll has been added to the spec)

let result = [...text.matchAll(/t(e)(s)t/g)];

Javascript Solutions


Solution 1 - Javascript

Using String's match() function won't return captured groups if the global modifier is set, as you found out.

In this case, you would want to use a RegExp object and call its exec() function. String's match() is almost identical to RegExp's exec() function…except in cases like these. If the global modifier is set, the normal match() function won't return captured groups, while RegExp's exec() function will. (Noted here, among other places.)

Another catch to remember is that exec() doesn't return the matches in one big array—it keeps returning matches until it runs out, in which case it returns null.

So, for example, you could do something like this:

var pattern = /t(e)(s)t/g;  // Alternatively, "new RegExp('t(e)(s)t', 'g');"
var match;    

while (match = pattern.exec(text)) {
    // Do something with the match (["test", "e", "s"]) here...
}

Another thing to note is that RegExp.prototype.exec() and RegExp.prototype.test() execute the regular expression on the provided string and return the first result. Every sequential call will step through the result set updating RegExp.prototype.lastIndex based on the current position in the string.

Here's an example: // remember there are 4 matches in the example and pattern. lastIndex starts at 0

pattern.test(text); // pattern.lastIndex = 4
pattern.test(text); // pattern.lastIndex = 9
pattern.exec(text); // pattern.lastIndex = 14
pattern.exec(text); // pattern.lastIndex = 19

// if we were to call pattern.exec(text) again it would return null and reset the pattern.lastIndex to 0
while (var match = pattern.exec(text)) {
    // never gets run because we already traversed the string
    console.log(match);
}

pattern.test(text); // pattern.lastIndex = 4
pattern.test(text); // pattern.lastIndex = 9

// however we can reset the lastIndex and it will give us the ability to traverse the string from the start again or any specific position in the string
pattern.lastIndex = 0;

while (var match = pattern.exec(text)) {
    // outputs all matches
    console.log(match);
}

You can find information on how to use RegExp objects on the MDN (specifically, here's the documentation for the exec() function).

Solution 2 - Javascript

I am surprised to see that I am the first person to answer this question with the answer I was looking for 10 years ago (the answer did not exist yet). I also was hoping that the actual spec writers would have answered it before me ;).

.matchAll has already been added to a few browsers.

In modern javascript we can now accomplish this by just doing the following.

let result = [...text.matchAll(/t(e)(s)t/g)];

.matchAll spec

.matchAll docs

I now maintain an isomorphic javascript library that helps with a lot of this type of string parsing. You can check it out here: string-saw. It assists in making .matchAll easier to use when using named capture groups.

An example would be

saw(text).matchAll(/t(e)(s)t/g)

Which outputs a more user-friendly array of matches, and if you want to get fancy you can throw in named capture groups and get an array of objects.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChad SciraView Question on Stackoverflow
Solution 1 - JavascripthbwView Answer on Stackoverflow
Solution 2 - JavascriptChad SciraView Answer on Stackoverflow