Why do regex constructors need to be double escaped?

JavascriptRegex

Javascript Problem Overview


In the regex below, \s denotes a space character. I imagine the regex parser, is going through the string and sees \ and knows that the next character is special.

But this is not the case as double escapes are required.

Why is this?

var res = new RegExp('(\\s|^)' + foo).test(moo);

Is there a concrete example of how a single escape could be mis-interpreted as something else?

Javascript Solutions


Solution 1 - Javascript

You are constructing the regular expression by passing a string to the RegExp constructor.

\ is an escape character in string literals.

The \ is consumed by the string literal parsing…

const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);

… so the data you pass to the RegEx compiler is a plain s and not \s.

You need to escape the \ to express the \ as data instead of being an escape character itself.

Solution 2 - Javascript

Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t, \n, \", etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.

So when you generate a string by saying var someString = '(\\s|^)', what you're really doing is creating an actual string with the value (\s|^).

Solution 3 - Javascript

The Regex needs a string representation of \s, which in JavaScript can be produced using the literal "\\s".

Here's a live example to illustrate why "\s" is not enough:

alert("One backslash:          \s\nDouble backslashes: \\s");

Note how an extra \ before \s changes the output.

Solution 4 - Javascript

As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \\s to represent a literal backslash, in most cases.

A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExp without having to double escape them: use the String.raw template tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:

console.log('\\'.length);           // length 1: an escaped backslash
console.log(`\\`.length);           // length 1: an escaped backslash
console.log(String.raw`\\`.length); // length 2: no escaping in String.raw!

So, if you wish to keep your code readable, and you have many backslashes, you may use String.raw to type only one backslash, when the pattern requires a backslash:

const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`\bfoo\sbar\sbaz\b`);
console.log(regex.test(sentence));

But there's a better option. Generally, there's not much good reason to use new RegExp unless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.raw to keep the pattern readable:

const sentence = 'foo bar baz';
const regex = /\bfoo\sbar\sbaz\b/;
console.log(regex.test(sentence));

Best to only use new RegExp when the pattern must be created on-the-fly, like in the following snippet:

const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input

const regex = new RegExp(String.raw`\b${wordToFind}\b`);
console.log(regex.test(sentence));

Solution 5 - Javascript

\ is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the \ in \s) you have to escape it via a backslash. So \ becomes \\ .

EDIT: Even had to do it here, because \\ in my answer turned to \.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSmurfetteView Question on Stackoverflow
Solution 1 - JavascriptQuentinView Answer on Stackoverflow
Solution 2 - JavascriptJoe EnosView Answer on Stackoverflow
Solution 3 - JavascriptCristian LupascuView Answer on Stackoverflow
Solution 4 - JavascriptCertainPerformanceView Answer on Stackoverflow
Solution 5 - JavascriptschlichtView Answer on Stackoverflow