What is the rationale for parenthesis in C++11's raw string literals R"(...)"?

C++C++11StandardsString Literals

C++ Problem Overview


There is a very convenient feature introduced in C++11 called raw string literals, which are strings with no escape characters. And instead of writing this:

  regex mask("\\t[0-9]+\\.[0-9]+\\t\\\\SUB");

You can simply write this:

  regex mask(R"(\t[0-9]+\.[0-9]+\t\\SUB)");

Quite more readable. However, note extra parenthesis around the string one have to place to define a raw string literal.

My question is, why do we even need these? For me it looks quite ugly and illogical. Here are the cons what I see:

  • Extra verbosity, while the whole feature is used to make literals more compact
  • Hard to distinguish between the body of the literal and the defining symbols

That's what I mean by the hard distinguishing:

"good old usual string literal"
 ^-    body inside quotes   -^

R"(new strange raw string literal)"
   ^- body inside parenthesis  -^

And here is the pro:

  • More flexibility, more characters available in raw strings, especially when used with the delimiter: "delim( can use "()" here )delim"

But hey, if you need more flexibility, you have old good escapeable string literals. Why the standard committee decided to pollute the content of every raw string literal with these absolutely unnecessary parenthesis? What was the rationale behind that? What are the pros I didn't mention?

UPD The answer by Kerrek is great, but it is not an answer, unfortunately. Since I already described that I understand how it works and what benefits does it give. Five years passed since I've asked this question, and still there is no answer. And I am still frustrated by this decision. One could say that this is a matter of taste, but I would disagree. How many spaces do you use, how do you name your variables, is this SomeFunction() or some_function() - this is the matter of taste. And I can really easily switch from one style to another.

But this?.. Still feels awkward and clumsy after so many years. No, this is not about the taste. This is about how we want to cover all possible cases no matter what. We doomed to write these ugly parens every time we need to write a Windows-specific path, or a regular expression, or a multi-line string literal. And for what?.. For those rare cases when we actually need to put " in a string? I wish I was on that committee meeting where they decided to do it this way. And I would be strongly against this really bad decision. I wish. Now we are doomed.

Thank you for reading this far. Now I feel a little better.

UPD2 Here are my alternative proposals, which I think both would be MUCH better than existing.

Proposal 1. Inspired by python. Cannot support string literals with triple quotes: R"""Here is a string literal with any content, except for triple quotes, which you don't actually use that often."""

Proposal 2. Inspired by common sense. Supports all possible string literals, just like the current one: R"delim"content of string"delim". With empty delimiter: R""Looks better, doesn't it?"". Empty raw string: R"""". Raw string with double quotes: R"#"Here are double quotes: "", thanks"#".

Any problems with these proposals?

C++ Solutions


Solution 1 - C++

The purpose of the parentheses is to allow you to specify a custom delimiter:

R"foo(Hello World)foo"   // the string "Hello World"

In your example, and in typical use, the delimiter is simply empty, so the raw string is enclosed by the sequences R"( and )".

Allowing for arbitrary delimiters is a design decision that reflects the desire to provide a complete solution without weird limitations or edge cases. You can pick any sequence of characters that does not occur in your string as the delimiter.

Without this, you would be in trouble if the string itself contained something like " (if you had just wanted R"..." as your raw string syntax) or )" (if the delimiter is empty). Both of those are perfectly common and frequent character sequences, especially in regular expressions, so it would be incredibly annoying if the decision whether or not you use a raw string depended on the specific content of your string.

Remember that inside the raw string there's no other escape mechanism, so the best you could do otherwise was to concatenate pieces of string literal, which would be very impractical. By allowing a custom delimiter, all you need to do is pick an unusual character sequence once, and maybe modify it in very rare cases when you make a future edit.

But to stress once again, even the empty delimiter is already useful, since the R"(...)" syntax allows you to place naked quotation marks in your string. That by itself is quite a gain.

Solution 2 - C++

As the other answer explains, there must be something additional to the quotation mark to avoid the parsing ambiguity in cases where " or )", or actually any closing sequence that may appear in the string itself.

As for the syntax choice, well, I agree the syntax choice is suboptimal, but it is OK in general (you could think of it: "things could be worse", lol). I think it is a good compromise between usage simplicity and parsing simplicity.

> Proposal 1. Inspired by python. Cannot support string literals with > triple quotes:
> R"""any content, except for triple quotes, which you > don't actually use that often."""

There is indeed a problem with this - "quotes, which you don't actually use that often". Firstly, the very idea of raw strings is to represent raw strings, i.e. exactly as they would appear in a text file, without any modifications to the string, regardless of the string contents. Secondly, the syntax should be general, i.e. without adding variations like "almost raw string", etc.

How would you write one quote with this syntax? Two quotes? Note - those are very common cases, especially when your code is dealing with strings and parsing.

> Proposal 2.
> R"delim"content of string"delim".
> R""Looks better, doesnt it?"".
> R"#"Here are double quotes: "", thanks"#".

Well, this one might be a better candidate. One thing though - a common case (and I believe it was a motivating case for accepted syntax), is that the double-quote character itself is very common and raw strings should come in handy for these cases.

So, lets see, normal string syntax:

s1 = "\"";
s2 = "\"quoted string\"";

Your syntax e.g. with "x" as delim:

s1 = R"x"""x";
s2 = R"x""quoted string""x";

Accepted syntax:

s1 = R"(")";
s2 = R"("quoted string")";

Yes, I agree that the brackets introduce some annoying visual effect. So I suspect the authors of the syntax were after the idea that the additional "delim" in this case will be rarely needed, since )" appears not very often inside a string. But OTOH, trailing/leading/isolated quotes are quite often, so e.g. your proposed syntax (#2) would require some delim more often, which in turn would require more often changing it from R"".."" to R"delim"..."delim". Hope you get the idea.

Could the syntax be better? I personally would prefer an even simpler variant of syntax:

Rdelim"string contents"delim;

With the above examples:

s1 = Rx"""x; 
s2 = Rx""quoted string""x;

However to work correctly (if its possible at all in current grammar), this variant would require limiting the character set for the delim part, say to letters/digits only (because of existing operators), and maybe some further restrictions for the initial character to avoid clashes with possible future grammar.
So I believe a better choice could have been made, although nothing significantly better can be done in this case.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMikhailView Question on Stackoverflow
Solution 1 - C++Kerrek SBView Answer on Stackoverflow
Solution 2 - C++Mikhail VView Answer on Stackoverflow