Is string::c_str() no longer null terminated in C++11?

C++StringC++11

C++ Problem Overview


In C++11 basic_string::c_str is defined to be exactly the same as basic_string::data, which is in turn defined to be exactly the same as *(begin() + n) and *(&*begin() + n) (when 0 <= n < size()).

I cannot find anything that requires the string to always have a null character at its end.

Does this mean that c_str() is no longer guaranteed to produce a null-terminated string?

C++ Solutions


Solution 1 - C++

Strings are now required to use null-terminated buffers internally. Look at the definition of operator[] (21.4.5):

> Requires: pos <= size(). > > Returns: *(begin() + pos) if pos < > size(), otherwise a reference to an object of type T with value > charT(); the referenced value shall not be modified.

Looking back at c_str (21.4.7.1/1), we see that it is defined in terms of operator[]:

> Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

And both c_str and data are required to be O(1), so the implementation is effectively forced to use null-terminated buffers.

Additionally, as David Rodríguez - dribeas points out in the comments, the return value requirement also means that you can use &operator[](0) as a synonym for c_str(), so the terminating null character must lie in the same buffer (since *(p + size()) must be equal to charT()); this also means that even if the terminator is initialised lazily, it's not possible to observe the buffer in the intermediate state.

Solution 2 - C++

Well, in fact it is true that the new standard stipulates that .data() and .c_str() are now synonyms. However, it doesn't say that .c_str() is no longer zero-terminated :)

It just means that you can now rely on .data() being zero-terminated as well.

> Paper N2668 defines c_str() and data() members of std::basic_string as follows:

> const charT* c_str() const; > const charT* data() const;

> Returns: A pointer to the initial element of an array of length size() + 1 whose first size() elements equal the corresponding elements of the string controlled by *this and whose last element is a null character specified by charT().

> Requires: The program shall not alter any of the values stored in the character array.

Note that this does NOT mean that any valid std::string can be treated as a C-string because std::string can contain embedded nulls, which will prematurely end the C-string when used directly as a const char*.

Addendum:

I don't have access to the actual published final spec of C++11 but it appears that indeed the wording was dropped somewhere in the revision history of the spec: e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf

> § 21.4.7 basic_string string operations [string.ops] > > § 21.4.7.1 basic_string accessors [string.accessors]

> const charT* c_str() const noexcept; > const charT* data() const noexcept; > > > 0. Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()]. > 0. Complexity: constant time. > 0. Requires: The program shall not alter any of the values stored in the character array.

Solution 3 - C++

The "history" was that a long time ago when everyone worked in single threads, or at least the threads were workers with their own data, they designed a string class for C++ which made string handling easier than it had been before, and they overloaded operator+ to concatenate strings.

The issue was that users would do something like:

s = s1 + s2 + s3 + s4;

and each concatenation would create a temporary which had to implement a string.

Therefore someone had the brainwave of "lazy evaluation" such that internally you could store some kind of "rope" with all the strings until someone wanted to read it as a C-string at which point you would change the internal representation to a contiguous buffer.

This solved the problem above but caused a load of other headaches, in particular in the multi-threaded world where one expected a .c_str() operation to be read-only / doesn't change anything and therefore no need to lock anything. Premature internal-locking in the class implementation just in case someone was doing it multi-threaded (when there wasn't even a threading standard) was also not a good idea. In fact it was more costly to do anything of this than simply copy the buffer each time. Same reason "copy on write" implementation was abandoned for string implementations.

Thus making .c_str() a truly immutable operation turned out to be the most sensible thing to do, however could one "rely" on it in a standard that now is thread-aware? Therefore the new standard decided to clearly state that you can, and thus the internal representation needs to hold the null terminator.

Solution 4 - C++

Well spotted. This is certainly a defect in the recently adopted standard; I'm sure that there was no intent to break all of the code currently using c_str. I would suggest a defect report, or at least asking the question in comp.std.c++ (which will usually end up before the committee if it concerns a defect).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMankarseView Question on Stackoverflow
Solution 1 - C++Mikhail GlushenkovView Answer on Stackoverflow
Solution 2 - C++seheView Answer on Stackoverflow
Solution 3 - C++CashCowView Answer on Stackoverflow
Solution 4 - C++James KanzeView Answer on Stackoverflow