How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()?

JavascriptRegex

Javascript Problem Overview


I am trying to parse url-encoded strings that are made up of key=value pairs separated by either & or &.

The following will only match the first occurrence, breaking apart the keys and values into separate result elements:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342', 'Adam%20Franco']

Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/g)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342=Adam%20Franco', '&348572=Bob%20Jones']

While I could split the string on & and break apart each key/value pair individually, is there any way using JavaScript's regular expression support to match multiple occurrences of the pattern /(?:&|&)?([^=]+)=([^&]+)/ similar to PHP's preg_match_all() function?

I'm aiming for some way to get results with the sub-matches separated like:

[['1111342', '348572'], ['Adam%20Franco', 'Bob%20Jones']]

or

[['1111342', 'Adam%20Franco'], ['348572', 'Bob%20Jones']]

Javascript Solutions


Solution 1 - Javascript

Hoisted from the comments

> 2020 comment: rather than using regex, we now have URLSearchParams, which does all of this for us, so no custom code, let alone regex, are necessary anymore.
>
> – Mike 'Pomax' Kamermans

Browser support is listed here https://caniuse.com/#feat=urlsearchparams


I would suggest an alternative regex, using sub-groups to capture name and value of the parameters individually and re.exec():

function getUrlParams(url) {
  var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
    params[decode(match[1])] = decode(match[2]);
  }
  return params;
}

var result = getUrlParams("http://maps.google.de/maps?f=q&source=s_q&hl=de&geocode=&q=Frankfurt+am+Main&sll=50.106047,8.679886&sspn=0.370369,0.833588&ie=UTF8&ll=50.116616,8.680573&spn=0.35972,0.833588&z=11&iwloc=addr");

result is an object:

{
f: "q"
geocode: ""
hl: "de"
ie: "UTF8"
iwloc: "addr"
ll: "50.116616,8.680573"
q: "Frankfurt am Main"
sll: "50.106047,8.679886"
source: "s_q"
spn: "0.35972,0.833588"
sspn: "0.370369,0.833588"
z: "11"
}

The regex breaks down as follows:

(?:            # non-capturing group
?|&         #   "?" or "&"
(?:amp;)?    #   (allow "&", for wrongly HTML-encoded URLs)
)              # end non-capturing group
(              # group 1
[^=&#]+      #   any character except "=", "&" or "#"; at least once
)              # end group 1 - this will be the parameter's name
(?:            # non-capturing group
=?           #   an "=", optional
(            #   group 2
[^&#]*     #     any character except "&" or "#"; any number of times
)            #   end group 2 - this will be the parameter's value
)              # end non-capturing group

Solution 2 - Javascript

You need to use the 'g' switch for a global search

var result = mystring.match(/(&|&)?([^=]+)=([^&]+)/g)

Solution 3 - Javascript

2020 edit

Use URLSearchParams, as this job no longer requires any kind of custom code. Browsers can do this for you with a single constructor:

const str = "1111342=Adam%20Franco&348572=Bob%20Jones";
const data = new URLSearchParams(str);
for (pair of data) console.log(pair)

yields

Array [ "1111342", "Adam Franco" ]
Array [ "348572", "Bob Jones" ]

So there is no reason to use regex for this anymore.

Original answer

If you don't want to rely on the "blind matching" that comes with running exec style matching, JavaScript does come with match-all functionality built in, but it's part of the replace function call, when using a "what to do with the capture groups" handling function:

var data = {};

var getKeyValue = function(fullPattern, group1, group2, group3) {
  data[group2] = group3;
};

mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, getKeyValue);

done.

Instead of using the capture group handling function to actually return replacement strings (for replace handling, the first arg is the full pattern match, and subsequent args are individual capture groups) we simply take the groups 2 and 3 captures, and cache that pair.

So, rather than writing complicated parsing functions, remember that the "matchAll" function in JavaScript is simply "replace" with a replacement handler function, and much pattern matching efficiency can be had.

Solution 4 - Javascript

For capturing groups, I'm used to using preg_match_all in PHP and I've tried to replicate it's functionality here:

<script>

// Return all pattern matches with captured groups
RegExp.prototype.execAll = function(string) {
	var match = null;
	var matches = new Array();
	while (match = this.exec(string)) {
	    var matchArray = [];
		for (i in match) {
		    if (parseInt(i) == i) {
		        matchArray.push(match[i]);
		    }
		}
		matches.push(matchArray);
	}
	return matches;
}

// Example
var someTxt = 'abc123 def456 ghi890';
var results = /[a-z]+(\d+)/g.execAll(someTxt);

// Output
[["abc123", "123"],
 ["def456", "456"],
 ["ghi890", "890"]]

</script>

Solution 5 - Javascript

Set the g modifier for a global match:

/…/g

Solution 6 - Javascript

Source:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

Finding successive matches

If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property). For example, assume you have this script:

var myRe = /ab*/g;
var str = 'abbcdefabh';
var myArray;
while ((myArray = myRe.exec(str)) !== null) {
  var msg = 'Found ' + myArray[0] + '. ';
  msg += 'Next match starts at ' + myRe.lastIndex;
  console.log(msg);
}

This script displays the following text:

Found abb. Next match starts at 3
Found ab. Next match starts at 912

Note: Do not place the regular expression literal (or RegExp constructor) within the while condition or it will create an infinite loop if there is a match due to the lastIndex property being reset upon each iteration. Also be sure that the global flag is set or a loop will occur here also.

Solution 7 - Javascript

Hеllo from 2020. Let me bring String.prototype.matchAll() to your attention:

let regexp = /(?:&|&amp;)?([^=]+)=([^&]+)/g;
let str = '1111342=Adam%20Franco&348572=Bob%20Jones';

for (let match of str.matchAll(regexp)) {
	let [full, key, value] = match;
	console.log(key + ' => ' + value);
}

Outputs:

1111342 => Adam%20Franco
348572 => Bob%20Jones

Solution 8 - Javascript

If someone (like me) needs Tomalak's method with array support (ie. multiple select), here it is:

function getUrlParams(url) {
  var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
	if( params[decode(match[1])] ) {
		if( typeof params[decode(match[1])] != 'object' ) {
			params[decode(match[1])] = new Array( params[decode(match[1])], decode(match[2]) );
		} else {
			params[decode(match[1])].push(decode(match[2]));
		}
	}
	else
		params[decode(match[1])] = decode(match[2]);
  }
  return params;
}
var urlParams = getUrlParams(location.search);

input ?my=1&my=2&my=things

result 1,2,things (earlier returned only: things)

Solution 9 - Javascript

Just to stick with the proposed question as indicated by the title, you can actually iterate over each match in a string using String.prototype.replace(). For example the following does just that to get an array of all words based on a regular expression:

function getWords(str) {
  var arr = [];
  str.replace(/\w+/g, function(m) {
    arr.push(m);
  });
  return arr;
}

var words = getWords("Where in the world is Carmen Sandiego?");
// > ["Where", "in", "the", "world", "is", "Carmen", "Sandiego"]

If I wanted to get capture groups or even the index of each match I could do that too. The following shows how each match is returned with the entire match, the 1st capture group and the index:

function getWords(str) {
  var arr = [];
  str.replace(/\w+(?=(.*))/g, function(m, remaining, index) {
    arr.push({ match: m, remainder: remaining, index: index });
  });
  return arr;
}

var words = getWords("Where in the world is Carmen Sandiego?");

After running the above, words will be as follows:

[  {    "match": "Where",    "remainder": " in the world is Carmen Sandiego?",    "index": 0  },  {    "match": "in",    "remainder": " the world is Carmen Sandiego?",    "index": 6  },  {    "match": "the",    "remainder": " world is Carmen Sandiego?",    "index": 9  },  {    "match": "world",    "remainder": " is Carmen Sandiego?",    "index": 13  },  {    "match": "is",    "remainder": " Carmen Sandiego?",    "index": 19  },  {    "match": "Carmen",    "remainder": " Sandiego?",    "index": 22  },  {    "match": "Sandiego",    "remainder": "?",    "index": 29  }]

In order to match multiple occurrences similar to what is available in PHP with preg_match_all you can use this type of thinking to make your own or use something like YourJS.matchAll(). YourJS more or less defines this function as follows:

function matchAll(str, rgx) {
  var arr, extras, matches = [];
  str.replace(rgx.global ? rgx : new RegExp(rgx.source, (rgx + '').replace(/[\s\S]+\//g , 'g')), function() {
    matches.push(arr = [].slice.call(arguments));
    extras = arr.splice(-2);
    arr.index = extras[0];
    arr.input = extras[1];
  });
  return matches[0] ? matches : null;
}

Solution 10 - Javascript

If you can get away with using map this is a four-line-solution:

var mystring = '1111342=Adam%20Franco&348572=Bob%20Jones';

var result = mystring.match(/(&|&amp;)?([^=]+)=([^&]+)/g) || [];
result = result.map(function(i) {
  return i.match(/(&|&amp;)?([^=]+)=([^&]+)/);
});

console.log(result);

Ain't pretty, ain't efficient, but at least it is compact. ;)

Solution 11 - Javascript

Use window.URL:

> s = 'http://www.example.com/index.html?1111342=Adam%20Franco&348572=Bob%20Jones'
> u = new URL(s)
> Array.from(u.searchParams.entries())
[["1111342", "Adam Franco"], ["348572", "Bob Jones"]]

Solution 12 - Javascript

Well... I had a similar problem... I want an incremental / step search with RegExp (eg: start search... do some processing... continue search until last match)

After lots of internet search... like always (this is turning an habit now) I end up in StackOverflow and found the answer...

Whats is not referred and matters to mention is "lastIndex" I now understand why the RegExp object implements the "lastIndex" property

Solution 13 - Javascript

To capture several parameters using the same name, I modified the while loop in Tomalak's method like this:

  while (match = re.exec(url)) {
  	var pName = decode(match[1]);
  	var pValue = decode(match[2]);
  	params[pName] ? params[pName].push(pValue) : params[pName] = [pValue];
  }

input: ?firstname=george&lastname=bush&firstname=bill&lastname=clinton

returns: {firstname : ["george", "bill"], lastname : ["bush", "clinton"]}

Solution 14 - Javascript

Splitting it looks like the best option in to me:

'1111342=Adam%20Franco&348572=Bob%20Jones'.split('&').map(x => x.match(/(?:&|&amp;)?([^=]+)=([^&]+)/))

Solution 15 - Javascript

To avoid regex hell you could find your first match, chop off a chunk then attempt to find the next one on the substring. In C# this looks something like this, sorry I've not ported it over to JavaScript for you.

        long count = 0;
        var remainder = data;
        Match match = null;
        do
        {
            match = _rgx.Match(remainder);
            if (match.Success)
            {
                count++;
                remainder = remainder.Substring(match.Index + 1, remainder.Length - (match.Index+1));
            }
        } while (match.Success);
        return count;

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAdam FrancoView Question on Stackoverflow
Solution 1 - JavascriptTomalakView Answer on Stackoverflow
Solution 2 - JavascriptmeouwView Answer on Stackoverflow
Solution 3 - JavascriptMike 'Pomax' KamermansView Answer on Stackoverflow
Solution 4 - JavascriptAram KocharyanView Answer on Stackoverflow
Solution 5 - JavascriptGumboView Answer on Stackoverflow
Solution 6 - JavascriptKIM TaegyoonView Answer on Stackoverflow
Solution 7 - JavascriptKlesunView Answer on Stackoverflow
Solution 8 - JavascriptfeduView Answer on Stackoverflow
Solution 9 - JavascriptChris WestView Answer on Stackoverflow
Solution 10 - JavascriptfboesView Answer on Stackoverflow
Solution 11 - JavascriptjnnnnnView Answer on Stackoverflow
Solution 12 - JavascriptZEEView Answer on Stackoverflow
Solution 13 - JavascriptivarView Answer on Stackoverflow
Solution 14 - JavascriptpguardiarioView Answer on Stackoverflow
Solution 15 - Javascriptandrew pateView Answer on Stackoverflow