Is this C++11 regex error me or the compiler?

C++RegexGccC++11

C++ Problem Overview


OK, this isn't the original program I had this problem in, but I duplicated it in a much smaller one. Very simple problem.

main.cpp:

#include <iostream>
#include <regex>
using namespace std;

int main()
{
	regex r1("S");
	printf("S works.\n");
	regex r2(".");
	printf(". works.\n");
	regex r3(".+");
	printf(".+ works.\n");
	regex r4("[0-9]");
	printf("[0-9] works.\n");
	return 0;
}

Compiled successfully with this command, no error messages:

$ g++ -std=c++0x main.cpp

The last line of g++ -v, by the way, is:

gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)

And the result when I try to run it:

$ ./a.out 
S works.
. works.
.+ works.
terminate called after throwing an instance of 'std::regex_error'
  what():  regex_error
Aborted

It happens the same way if I change r4 to \\s, \\w, or [a-z]. Is this a problem with the compiler? I might be able to believe that C++11's regex engine has different ways of saying "whitespace" or "word character," but square brackets not working is a stretch. Is it something that's been fixed in 4.6.2?

EDIT:

Joachim Pileborg has supplied a partial solution, using an extra regex_constants parameter to enable a syntax that supports square brackets, but neither basic, extended, awk, nor ECMAScript seem to support backslash-escaped terms like \\s, \\w, or \\t.

EDIT 2:

Using raw strings (R"(\w)" instead of "\\w") doesn't seem to work either.

C++ Solutions


Solution 1 - C++

Update: <regex> is now implemented and released in GCC 4.9.0


Old answer:

ECMAScript syntax accepts [0-9], \s, \w, etc, see ECMA-262 (15.10). Here's an example with boost::regex that also uses the ECMAScript syntax by default:

#include <boost/regex.hpp>

int main(int argc, char* argv[]) {
  using namespace boost;
  regex e("[0-9]");
  return argc > 1 ? !regex_match(argv[1], e) : 2;
}

It works:

$ g++ -std=c++0x *.cc -lboost_regex && ./a.out 1

According to the C++11 standard (28.8.2) basic_regex() uses regex_constants::ECMAScript flag by default so it must understand this syntax.

> Is this C++11 regex error me or the compiler?

gcc-4.6.1 doesn't support c++11 regular expressions (28.13).

Solution 2 - C++

The error is because creating a regex by default uses ECMAScript syntax for the expression, which doesn't support brackets. You should declare the expression with the basic or extended flag:

std::regex r4("[0-9]", std::regex_constants::basic);

Edit Seems like libstdc++ (part of GCC, and the library that handles all C++ stuff) doesn't fully implement regular expressions yet. In their status document they say that Modified ECMAScript regular expression grammar is not implemented yet.

Solution 3 - C++

Regex support improved between gcc 4.8.2 and 4.9.2. For example, the regex =[A-Z]{3} was failing for me with:

> Regex error

After upgrading to gcc 4.9.2, it works as expected.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionShay GuyView Question on Stackoverflow
Solution 1 - C++jfsView Answer on Stackoverflow
Solution 2 - C++Some programmer dudeView Answer on Stackoverflow
Solution 3 - C++Drew NoakesView Answer on Stackoverflow