Is it mandatory to escape tabulator characters in C and C++?

C++C

C++ Problem Overview


In C and C++ (and several other languages) horizontal tabulators (ASCII code 9) in character and string constants are denoted in escaped form as '\t' and "\t". However, I am regularly typing the unescaped tabulator character in string literals as for example in "A B" (there is a TAB in betreen A and B), and at least clang++ does not seem to bother - the string seems to be equivalent to "A\tB". I like the unescaped version better since long indented multi-line strings are better readable in the source code.

Now I am asking myself whether this is generally legal in C and C++ or just supported by my compiler. How portable are unescaped tabulators in character and string constants?

Surprisingly I could not find an answer to this seemingly simple question, neither with Google nor on stackoverflow (I just found this vaguely related question).

C++ Solutions


Solution 1 - C++

Yes, you can include a tab character in a string or character literal, at least according to C++11. The allowed characters include (with my emphasis):

> any member of the source character set except the double-quote ", backslash \, or new-line character

(from C++11 standard, annex A.2)

and the source character set includes:

> the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters

(from C++11 standard, paragraph 2.3.1)

UPDATE: I've just noticed that you're asking about two different languages. For C99, the answer is also yes. The wording is different, but basically says the same thing:

> In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or [...]

where both the source and execution character sets include

> control characters representing horizontal tab, vertical tab, and form feed.

Solution 2 - C++

It's completely legal to put a tab character directly into a character string or character literal. The C and C++ standards require the source character set to include a tab character, and string and character literals may contain any character in the source character set except backslash, quote or apostrophe (as appropriate) and newline.

So it's portable. But it is not a good idea, since there is no way a reader can distinguish between different kinds of whitespace. It is also quite common for text editors, mail programs, and the like to reformat tabs, so bugs may be introduced into the program in the course of such operations.

Solution 3 - C++

If you enter a tab into an input, then your string will contain a literal tab character, and it will stay a tab character - it wont' be magically translated into \t internally.

Same goes for writing code - you can embed literal tab characters in your strings. However, consider this:

     T     T     T        <--tab stops
012345012345012345012345
foo1 = 'a\tb';
foo2 = 'a  b'; // pressed tab in the editor
foo3 = 'a  b'; // hit space twice in the editor

Unless you put the cursor on the whitespace between a and b and checked how many characters are in there, there is essentially NO way to determine if there's a tab or actual space characters in there. But with the \t version, it is immediately shown to be a tab.

Solution 4 - C++

When you press the TAB key you get whatever code point your system maps that key to. That code point may or may not be a tab on the system where the program runs. When you put \t in a literal the compiler replaces it with the appropriate code point for the target system. So if you want to be sure that you get a tab on the system where the program runs, use \t. That's its job.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiontglasView Question on Stackoverflow
Solution 1 - C++Mike SeymourView Answer on Stackoverflow
Solution 2 - C++riciView Answer on Stackoverflow
Solution 3 - C++Marc BView Answer on Stackoverflow
Solution 4 - C++Pete BeckerView Answer on Stackoverflow