What makes this usage of pointers unpredictable?

C++Pointers

C++ Problem Overview


I'm currently learning pointers and my professor provided this piece of code as an example:

//We cannot predict the behavior of this program!

#include <iostream>
using namespace std;

int main()
{
    char * s = "My String";
    char s2[] = {'a', 'b', 'c', '\0'};

    cout << s2 << endl;

    return 0;
}

He wrote in the comments that we can't predict the behavior of the program. What exactly makes it unpredictable though? I see nothing wrong with it.

C++ Solutions


Solution 1 - C++

The behaviour of the program is non-existent, because it is ill-formed.

char* s = "My String";

This is illegal. Prior to 2011, it had been deprecated for 12 years.

The correct line is:

const char* s = "My String";

Other than that, the program is fine. Your professor should drink less whiskey!

Solution 2 - C++

The answer is: it depends on what C++ standard you're compiling against. All the code is perfectly well-formed across all standards‡ with the exception of this line:

char * s = "My String";

Now, the string literal has type const char[10] and we're trying to initialize a non-const pointer to it. For all other types other than the char family of string literals, such an initialization was always illegal. For example:

const int arr[] = {1};
int *p = arr; // nope!

However, in pre-C++11, for string literals, there was an exception in §4.2/2:

> A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; [...]. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ]

So in C++03, the code is perfectly fine (though deprecated), and has clear, predictable behavior.

In C++11, that block does not exist - there is no such exception for string literals converted to char*, and so the code is just as ill-formed as the int* example I just provided. The compiler is obligated to issue a diagnostic, and ideally in cases such as this that are clear violations of the C++ type system, we would expect a good compiler to not just be conforming in this regard (e.g. by issuing a warning) but to fail outright.

The code should ideally not compile - but does on both gcc and clang (I assume because there's probably lots of code out there that would be broken with little gain, despite this type system hole being deprecated for over a decade). The code is ill-formed, and thus it does not make sense to reason about what the behavior of the code might be. But considering this specific case and the history of it being previously allowed, I do not believe it to be an unreasonable stretch to interpret the resulting code as if it were an implicit const_cast, something like:

const int arr[] = {1};
int *p = const_cast<int*>(arr); // OK, technically

With that, the rest of the program is perfectly fine, as you never actually touch s again. Reading a created-const object via a non-const pointer is perfectly OK. Writing a created-const object via such a pointer is undefined behavior:

std::cout << *p; // fine, prints 1
*p = 5;          // will compile, but undefined behavior, which
                 // certainly qualifies as "unpredictable"

As there is no modification via s anywhere in your code, the program is fine in C++03, should fail to compile in C++11 but does anyway - and given that the compilers allow it, there's still no undefined behavior in it†. With allowances that the compilers are still [incorrectly] interpreting the C++03 rules, I see nothing that would lead to "unpredictable" behavior. Write to s though, and all bets are off. In both C++03 and C++11.


†Though, again, by definition ill-formed code yields no expectation of reasonable behavior
‡Except not, see Matt McNabb's answer

Solution 3 - C++

Other answers have covered that this program is ill-formed in C++11 due to the assignment of a const char array to a char *.

However the program was ill-formed prior to C++11 also.

The operator<< overloads are in <ostream>. The requirement for iostream to include ostream was added in C++11.

Historically, most implementations had iostream include ostream anyway, perhaps for ease of implementation or perhaps in order to provide a better QoI.

But it would be conforming for iostream to only define the ostream class without defining the operator<< overloads.

Solution 4 - C++

The only slightly wrong thing that I see with this program is that you're not supposed to assign a string literal to a mutable char pointer, though this is often accepted as a compiler extension.

Otherwise, this program appears well-defined to me:

  • The rules that dictate how character arrays become character pointers when passed as parameters (such as with cout << s2) are well-defined.
  • The array is null-terminated, which is a condition for operator<< with a char* (or a const char*).
  • #include <iostream> includes <ostream>, which in turn defines operator<<(ostream&, const char*), so everything appears to be in place.

Solution 5 - C++

You can't predict the behaviour of the compiler, for reasons noted above. (It should fail to compile, but may not.)

If compilation succeeds, then the behaviour is well-defined. You certainly can predict the behaviour of the program.

If it fails to compile, there is no program. In a compiled language, the program is the executable, not the source code. If you don't have an executable, you don't have a program, and you can't talk about behaviour of something that doesn't exist.

So I'd say your prof's statement is wrong. You can't predict the behaviour of the compiler when faced with this code, but that's distinct from the behaviour of the program. So if he's going to pick nits, he'd better make sure he's right. Or, of course, you might have misquoted him and the mistake is in your translation of what he said.

Solution 6 - C++

As others have noted, the code is illegitimate under C++11, although it was valid under earlier versions. Consequently, a compiler for C++11 is required to issue at least one diagnostic, but behavior of the compiler or the remainder of the build system is unspecified beyond that. Nothing in the Standard would forbid a compiler from exiting abruptly in response to an error, leaving a partially-written object file which a linker might think was valid, yielding a broken executable.

Although a good compiler should always ensure before it exits that any object file it is expected to have produced will be either valid, non-existent, or recognizable as invalid, such issues fall outside the jurisdiction of the Standard. While there have historically been (and may still be) some platforms where a failed compilation can result in legitimate-appearing executable files that crash in arbitrary fashion when loaded (and I've had to work with systems where link errors often had such behavior), I would not say that the consequences of syntax errors are generally unpredictable. On a good system, an attempted build will generally either produce an executable with a compiler's best effort at code generation, or won't produce an executable at all. Some systems will leave behind the old executable after a failed build, since in some cases being able to run the last successful build may be useful, but that can also lead to confusion.

My personal preference would be for disk-based systems to to rename the output file, to allow for the rare occasions when that executable would be useful while avoiding the confusion that can result from mistakenly believing one is running new code, and for embedded-programming systems to allow a programmer to specify for each project a program that should be loaded if a valid executable is not available under the normal name [ideally something which which safely indicates the lack of a useable program]. An embedded-systems tool-set would generally have no way of knowing what such a program should do, but in many cases someone writing "real" code for a system will have access to some hardware-test code that could easily be adapted to the purpose. I don't know that I've seen the renaming behavior, however, and I know that I haven't seen the indicated programming behavior.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiontrungntView Question on Stackoverflow
Solution 1 - C++Lightness Races in OrbitView Answer on Stackoverflow
Solution 2 - C++BarryView Answer on Stackoverflow
Solution 3 - C++M.MView Answer on Stackoverflow
Solution 4 - C++zneakView Answer on Stackoverflow
Solution 5 - C++GrahamView Answer on Stackoverflow
Solution 6 - C++supercatView Answer on Stackoverflow