RegEx to extract all matches from string using RegExp.exec

JavascriptRegexRegex GroupTaskwarrior

Javascript Problem Overview


I'm trying to parse the following kind of string:

[key:"val" key2:"val2"]

where there are arbitrary key:"val" pairs inside. I want to grab the key name and the value. For those curious I'm trying to parse the database format of task warrior.

Here is my test string:

[description:"aoeu" uuid:"123sth"]

which is meant to highlight that anything can be in a key or value aside from space, no spaces around the colons, and values are always in double quotes.

In node, this is my output:

[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',  'uuid',  '123sth',  index: 0,  input: '[description:"aoeu" uuid:"123sth"]' ]

But description:"aoeu" also matches this pattern. How can I get all matches back?

Javascript Solutions


Solution 1 - Javascript

Continue calling re.exec(s) in a loop to obtain all the matches:

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;

do {
    m = re.exec(s);
    if (m) {
        console.log(m[1], m[2]);
    }
} while (m);

Try it with this JSFiddle: https://jsfiddle.net/7yS2V/

Solution 2 - Javascript

str.match(pattern), if pattern has the global flag g, will return all the matches as an array.

For example:

const str = 'All of us except @Emran, @Raju and @Noman were there';
console.log(
  str.match(/@\w*/g)
);
// Will log ["@Emran", "@Raju", "@Noman"]

Solution 3 - Javascript

To loop through all matches, you can use the replace function:

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';

s.replace(re, function(match, g1, g2) { console.log(g1, g2); });

Solution 4 - Javascript

This is a solution

var s = '[description:"aoeu" uuid:"123sth"]';

var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
  console.log(m[1], m[2]);
}

This is based on lawnsea's answer, but shorter.

Notice that the `g' flag must be set to move the internal pointer forward across invocations.

Solution 5 - Javascript

str.match(/regex/g)

returns all matches as an array.

If, for some mysterious reason, you need the additional information comes with exec, as an alternative to previous answers, you could do it with a recursive function instead of a loop as follows (which also looks cooler :).

function findMatches(regex, str, matches = []) {
   const res = regex.exec(str)
   res && matches.push(res) && findMatches(regex, str, matches)
   return matches
}

// Usage
const matches = findMatches(/regex/g, str)

as stated in the comments before, it's important to have g at the end of regex definition to move the pointer forward in each execution.

Solution 6 - Javascript

We are finally beginning to see a built-in matchAll function, see here for the description and compatibility table. It looks like as of May 2020, Chrome, Edge, Firefox, and Node.js (12+) are supported but not IE, Safari, and Opera. Seems like it was drafted in December 2018 so give it some time to reach all browsers, but I trust it will get there.

The built-in matchAll function is nice because it returns an iterable. It also returns capturing groups for every match! So you can do things like

// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);

for (match of matches) {
    console.log("letter before:" + match[1]);
    console.log("letter after:" + match[2]);
}

arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array

It also seem like every match object uses the same format as match(). So each object is an array of the match and capturing groups, along with three additional properties index, input, and groups. So it looks like:

[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]

For more information about matchAll there is also a Google developers page. There are also polyfills/shims available.

Solution 7 - Javascript

If you have ES9

(Meaning if your system: Chrome, Node.js, Firefox, etc supports Ecmascript 2019 or later)

Use the new yourString.matchAll( /your-regex/ ).

If you don't have ES9

If you have an older system, here's a function for easy copy and pasting

function findAll(regexPattern, sourceString) {
    let output = []
    let match
    // auto-add global flag while keeping others as-is
    let regexPatternWithGlobal = RegExp(regexPattern,[...new Set("g"+regexPattern.flags)].join(""))
    while (match = regexPatternWithGlobal.exec(sourceString)) {
        // get rid of the string copy
        delete match.input
        // store the match data
        output.push(match)
    } 
    return output
}

example usage:

console.log(   findAll(/blah/g,'blah1 blah2')   ) 

outputs:

[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]

Solution 8 - Javascript

Based on Agus's function, but I prefer return just the match values:

var bob = "&gt; bob &lt;";
function matchAll(str, regex) {
    var res = [];
    var m;
    if (regex.global) {
        while (m = regex.exec(str)) {
            res.push(m[1]);
        }
    } else {
        if (m = regex.exec(str)) {
            res.push(m[1]);
        }
    }
    return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch);  // yeilds: [&gt;, &lt;]

Solution 9 - Javascript

Iterables are nicer:

const matches = (text, pattern) => ({
  [Symbol.iterator]: function * () {
    const clone = new RegExp(pattern.source, pattern.flags);
    let match = null;
    do {
      match = clone.exec(text);
      if (match) {
        yield match;
      }
    } while (match);
  }
});

Usage in a loop:

for (const match of matches('abcdefabcdef', /ab/g)) {
  console.log(match);
}

Or if you want an array:

[ ...matches('abcdefabcdef', /ab/g) ]

Solution 10 - Javascript

Here is my function to get the matches :

function getAllMatches(regex, text) {
    if (regex.constructor !== RegExp) {
        throw new Error('not RegExp');
    }

    var res = [];
    var match = null;

    if (regex.global) {
        while (match = regex.exec(text)) {
            res.push(match);
        }
    }
    else {
        if (match = regex.exec(text)) {
            res.push(match);
        }
    }

    return res;
}

// Example:

var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');

res.forEach(function (item) {
    console.log(item[0]);
});

Solution 11 - Javascript

If you're able to use matchAll here's a trick:

Array.From has a 'selector' parameter so instead of ending up with an array of awkward 'match' results you can project it to what you really need:

Array.from(str.matchAll(regexp), m => m[0]);

If you have named groups eg. (/(?<firstname>[a-z][A-Z]+)/g) you could do this:

Array.from(str.matchAll(regexp), m => m.groups.firstName);

Solution 12 - Javascript

Since ES9, there's now a simpler, better way of getting all the matches, together with information about the capture groups, and their index:

const string = 'Mice like to dice rice';
const regex = /.ice/gu;
for(const match of string.matchAll(regex)) {
    console.log(match);
}

> // ["mice", index: 0, input: "mice like to dice rice", groups: > undefined] > > // ["dice", index: 13, input: "mice like to dice rice", > groups: undefined] > > // ["rice", index: 18, input: "mice like to dice > rice", groups: undefined]

It is currently supported in Chrome, Firefox, Opera. Depending on when you read this, check this link to see its current support.

Solution 13 - Javascript

Use this...

var all_matches = your_string.match(re);
console.log(all_matches)

It will return an array of all matches...That would work just fine.... But remember it won't take groups in account..It will just return the full matches...

Solution 14 - Javascript

I would definatly recommend using the String.match() function, and creating a relevant RegEx for it. My example is with a list of strings, which is often necessary when scanning user inputs for keywords and phrases.

    // 1) Define keywords
    var keywords = ['apple', 'orange', 'banana'];
    
    // 2) Create regex, pass "i" for case-insensitive and "g" for global search
    regex = new RegExp("(" + keywords.join('|') + ")", "ig");
    => /(apple|orange|banana)/gi

    // 3) Match it against any string to get all matches 
    "Test string for ORANGE's or apples were mentioned".match(regex);
    => ["ORANGE", "apple"]

Hope this helps!

Solution 15 - Javascript

This isn't really going to help with your more complex issue but I'm posting this anyway because it is a simple solution for people that aren't doing a global search like you are.

I've simplified the regex in the answer to be clearer (this is not a solution to your exact problem).

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

// We only want the group matches in the array
function purify_regex(reResult){

  // Removes the Regex specific values and clones the array to prevent mutation
  let purifiedArray = [...reResult];

  // Removes the full match value at position 0
  purifiedArray.shift();

  // Returns a pure array without mutating the original regex result
  return purifiedArray;
}

// purifiedResult= ["description", "aoeu"]

That looks more verbose than it is because of the comments, this is what it looks like without comments

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

function purify_regex(reResult){
  let purifiedArray = [...reResult];
  purifiedArray.shift();
  return purifiedArray;
}

Note that any groups that do not match will be listed in the array as undefined values.

This solution uses the ES6 spread operator to purify the array of regex specific values. You will need to run your code through Babel if you want IE11 support.

Solution 16 - Javascript

Here's a one line solution without a while loop.

The order is preserved in the resulting list.

The potential downsides are

  1. It clones the regex for every match.
  2. The result is in a different form than expected solutions. You'll need to process them one more time.
let re = /\s*([^[:]+):\"([^"]+)"/g
let str = '[description:"aoeu" uuid:"123sth"]'

(str.match(re) || []).map(e => RegExp(re.source, re.flags).exec(e))

[ [ 'description:"aoeu"',
    'description',
    'aoeu',
    index: 0,
    input: 'description:"aoeu"',
    groups: undefined ],
  [ ' uuid:"123sth"',
    'uuid',
    '123sth',
    index: 0,
    input: ' uuid:"123sth"',
    groups: undefined ] ]

Solution 17 - Javascript

My guess is that if there would be edge cases such as extra or missing spaces, this expression with less boundaries might also be an option:

^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$

> If you wish to explore/simplify/modify the expression, it's been > explained on the top right panel of > regex101.com. If you'd like, you > can also watch in this > link, how it would match > against some sample inputs.


###Test

const regex = /^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$/gm;
const str = `[description:"aoeu" uuid:"123sth"]
[description : "aoeu" uuid: "123sth"]
[ description : "aoeu" uuid: "123sth" ]
 [ description : "aoeu"   uuid : "123sth" ]
 [ description : "aoeu"uuid  : "123sth" ] `;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Solution 18 - Javascript

Here is my answer:

var str = '[me nombre es] : My name is. [Yo puedo] is the right word'; 

var reg = /\[(.*?)\]/g;

var a = str.match(reg);

a = a.toString().replace(/[\[\]]/g, "").split(','));

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiongatlinView Question on Stackoverflow
Solution 1 - JavascriptlawnseaView Answer on Stackoverflow
Solution 2 - JavascriptAnisView Answer on Stackoverflow
Solution 3 - JavascriptChristopheView Answer on Stackoverflow
Solution 4 - JavascriptlovasoaView Answer on Stackoverflow
Solution 5 - JavascripteaorakView Answer on Stackoverflow
Solution 6 - Javascriptwoojoo666View Answer on Stackoverflow
Solution 7 - JavascriptJeff HykinView Answer on Stackoverflow
Solution 8 - JavascriptbobView Answer on Stackoverflow
Solution 9 - JavascriptsdgfsdhView Answer on Stackoverflow
Solution 10 - JavascriptAgus SyahputraView Answer on Stackoverflow
Solution 11 - JavascriptSimon_WeaverView Answer on Stackoverflow
Solution 12 - Javascriptiuliu.netView Answer on Stackoverflow
Solution 13 - JavascriptSubham DebnathView Answer on Stackoverflow
Solution 14 - JavascriptSebastian SchollView Answer on Stackoverflow
Solution 15 - JavascriptDaniel TononView Answer on Stackoverflow
Solution 16 - JavascriptJae Won JangView Answer on Stackoverflow
Solution 17 - JavascriptEmmaView Answer on Stackoverflow
Solution 18 - JavascriptdaguangView Answer on Stackoverflow