RegExp.exec() returns NULL sporadically
JavascriptRegexJavascript Problem Overview
I am seriously going crazy over this and I've already spent an unproportionate amount of time on trying to figure out what's going on here. So please give me a hand =)
I need to do some RegExp matching of strings in JavaScript. Unfortunately it behaves very strangely. This code:
var rx = /(cat|dog)/gi;
var w = new Array("I have a cat and a dog too.", "There once was a dog and a cat.", "I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.");
for (var i in w) {
var m = null;
m = rx.exec(w[i]);
if(m){
document.writeln("<pre>" + i + "\nINPUT: " + w[i] + "\nMATCHES: " + m.slice(1) + "</pre>");
}else{
document.writeln("<pre>" + i + "\n'" + w[i] + "' FAILED.</pre>");
}
}
Returns "cat" and "dog" for the first two elements, as it should be, but then some exec()
-calls start returning null
. I don't understand why.
I posted a Fiddle here, where you can run and edit the code.
And so far I've tried this in Chrome and Firefox.
Javascript Solutions
Solution 1 - Javascript
Oh, here it is. Because you're defining your regex global, it matches first cat
, and on the second pass of the loop dog
. So, basically you just need to reset your regex (it's internal pointer) as well. Cf. this:
var w = new Array("I have a cat and a dog too.", "I have a cat and a dog too.", "I have a cat and a dog too.", "I have a cat and a dog too.");
for (var i in w) {
var rx = /(cat|dog)/gi;
var m = null;
m = rx.exec(w[i]);
if(m){
document.writeln("<p>" + i + "<br/>INPUT: " + w[i] + "<br/>MATCHES: " + w[i].length + "</p>");
}else{
document.writeln("<p><b>" + i + "<br/>'" + w[i] + "' FAILED.</b><br/>" + w[i].length + "</p>");
}
document.writeln(m);
}
Solution 2 - Javascript
The regex object has a property lastIndex
which is updated when you run exec
. So when you exec the regex on e.g. "I have a cat and a dog too.", lastIndex
is set to 12. The next time you run exec
on the same regex object, it starts looking from index 12. So you have to reset the lastIndex
property between each run.
Solution 3 - Javascript
Two things:
- The mentioned need of reset when using the
g
(global) flag. To solve this I recommed simply assign0
to thelastIndex
member of theRegExp
object. This have better performance than destroy-and-recreate. - Be careful when use
in
keyword in order to walk anArray
object, because can lead to unexpected results with some libs. Sometimes you should check with somethign likeisNaN(i)
, or if you know it don't have holes, use the classic for loop.
The code can be:
var rx = /(cat|dog)/gi;
w = ["I have a cat and a dog too.", "There once was a dog and a cat.", "I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat.","I have a cat and a dog too.", "There once was a dog and a cat."];
for (var i in w)
if(!isNaN(i)) // Optional, check it is an element if Array could have some odd members.
{
var m = null;
m = rx.exec(w[i]); // Run
rx.lastIndex = 0; // Reset
if(m)
{
document.writeln("<pre>" + i + "\nINPUT: " + w[i] + "\nMATCHES: " + m.slice(1) + "</pre>");
} else {
document.writeln("<pre>" + i + "\n'" + w[i] + "' FAILED.</pre>");
}
}
Solution 4 - Javascript
I had a similar problem using /g only, and the proposed solution here did not work for me in FireFox 3.6.8. I got my script working with
var myRegex = new RegExp("my string", "g");
I'm adding this in case someone else has the same problem I did with the above solution.