Will std::string always be null-terminated in C++11?

C++StringC++11Language LawyerNull Terminated

C++ Problem Overview


In a 2008 post on his site, Herb Sutter states the following:

>There is an active proposal to tighten this up further in C++0x and require null-termination and possibly ban copy-on-write implementations, for concurrency-related reasons. Here’s the paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534.html . I think that one or both of the proposals in this paper is likely to be adopted, but we’ll see at the next meeting or two.

I know that C++11 now guarantees that the std::string contents get stored contiguously, but did they adopt the above in the final draft?

Will it now be safe to use something like &str[0]?

C++ Solutions


Solution 1 - C++

Yes. Per the C++0x FDIS 21.4.7.1/1, std::basic_string::c_str() must return

> a pointer p such that p + i == &operator[](i) for each i in [0,size()].

This means that given a string s, the pointer returned by s.c_str() must be the same as the address of the initial character in the string (&s[0]).

Solution 2 - C++

&str[0] is safe to use -- so long as you do not assume it points to a null-terminated string.

Since C++11 the requirements include (section [string.accessors]):

  • str.data() and str.c_str() point to a null-terminated string.
  • &str[i] == str.data() + i , for 0 <= i <= str.size()
    • note that this implies the storage is contiguous.

However, there is no requirement that &str[0] + str.size() points to a null terminator.

A conforming implementation must place the null terminator contiguously in storage when data(), c_str() or operator[](str.size()) are called; but there is no requirement to place it in any other situation, such as calls to operator[] with other arguments.


To save you on reading the long chat discussion below: The objection was been raised that if c_str() were to write a null terminator, it would cause a data race under res.on.data.races#3 ; and I disagreed that it would be a data race .

Solution 3 - C++

Although c_str() returns a null terminated version of the std::string, surprises may await when mixing C++ std::string with C char* strings.

Null characters may end up within a C++ std::string, which can lead to subtle bugs as C functions will see a shorter string.

Buggy code may overwrite the null terminator. This results in undefined behaviour. C functions would then read beyond the string buffer, potentially causing a crash.

#include <string>
#include <iostream>
#include <cstdio>
#include <cstring>

int main()
{
    std::string embedded_null = "hello\n";
    embedded_null += '\0';
    embedded_null += "world\n";

    // C string functions finish early at embedded \0
    std::cout << "C++ size: " << embedded_null.size() 
              << " value: " << embedded_null;
    printf("C strlen: %d value: %s\n", 
           strlen(embedded_null.c_str()), 
           embedded_null.c_str());

    std::string missing_terminator(3, 'n');
    missing_terminator[3] = 'a'; // BUG: Undefined behaviour

    // C string functions read beyond buffer and may crash
    std::cout << "C++ size: " << missing_terminator.size() 
              << " value: " << missing_terminator << '\n';
    printf("C strlen: %d value: %s\n", 
           strlen(missing_terminator.c_str()), 
           missing_terminator.c_str());
}

Output:

$ c++ example.cpp
$ ./a.out
C++ size: 13 value: hello
world
C strlen: 6 value: hello

C++ size: 3 value: nnn
C strlen: 6 value: nnna�

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionlinks77View Question on Stackoverflow
Solution 1 - C++James McNellisView Answer on Stackoverflow
Solution 2 - C++M.MView Answer on Stackoverflow
Solution 3 - C++PFeeView Answer on Stackoverflow