Does the C standard permit assigning an arbitrary value to a pointer and incrementing it?

CPointersLanguage LawyerPointer Arithmetic

C Problem Overview


Is the behaviour of this code well defined?

#include <stdio.h>
#include <stdint.h>

int main(void)
{
    void *ptr = (char *)0x01;
    size_t val;

    ptr = (char *)ptr + 1;
    val = (size_t)(uintptr_t)ptr;

    printf("%zu\n", val);
    return 0;
}

I mean, can we assign some fixed number to a pointer and increment it even if it is pointing to some random address? (I know that you can not dereference it)

C Solutions


Solution 1 - C

The assignment:

void *ptr = (char *)0x01;

Is implementation defined behavior because it is converting an integer to a pointer. This is detailed in section 6.3.2.3 of the C standard regarding Pointers:

> 5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, > might not be correctly aligned, might not point to an entity > of the referenced type, and might be a trap representation.

As for the subsequent pointer arithmetic:

ptr = (char *)ptr + 1;

This is dependent on a few things.

First, the current value of ptr may be a trap representation as per 6.3.2.3 above. If it is, the behavior is undefined.

Next is the question of whether 0x1 points to a valid object. Adding a pointer and an integer is only valid if both the pointer operand and the result point to elements of an array object (a single object counts as an array of size 1) or one element past the array object. This is detailed in section 6.5.6:

> 7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a > pointer to the first element of an array of length one with the type > of the object as its element type > > 8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer > operand. If the pointer operand points to an element of an array > object, and the array is large enough, the result points to an element > offset from the original element such that the difference of the > subscripts of the resulting and original array elements equals the > integer expression. In other words, if the expression P points to the > i-th element of an array object, the expressions (P)+N (equivalently, N+(P) ) and (P)-N (where N has the value n ) point to, > respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an > array object, the expression (P)+1 points one past the last element of > the array object, and if the expression Q points one past the > last element of an array object, the expression (Q)-1 points to > the last element of the array object. If both the pointer > operand and the result point to elements of the same array > object, or one past the last element of the array object, the > evaluation shall not produce an overflow; otherwise, the behavior is > undefined. If the result points one past the last element of the > array object, it shall not be used as the operand of a unary > * operator that is evaluated.

On a hosted implementation the value 0x1 almost certainly does not point to a valid object, in which case the addition is undefined. An embedded implementation could however support setting pointers to specific values, and if so it could be the case that 0x1 does in fact point to a valid object. If so, the behavior is well defined, otherwise it is undefined.

Solution 2 - C

No, the behaviour of this program is undefined. Once an undefined construct is reached in a program, any future behaviour is undefined. Paradoxically, any past behaviour is undefined too.

The result of void *ptr = (char*)0x01; is implementation-defined, due in part to the fact that a char can have a trap representation.

But the behaviour of the ensuing pointer arithmetic in the statement ptr = (char *)ptr + 1; is undefined. This is because pointer arithmetic is only valid within arrays including one past the end of the array. For this purpose an object is an array of length one.

Solution 3 - C

Yes, the code is well-defined as implementation-defined. It is not undefined. See ISO/IEC 9899:2011 [6.3.2.3]/5 and note 67.

The C language was originally created as a system programming language. Systems programming required manipulating memory-mapped hardware, requiring that you would stuff hard-coded addresses into pointers, sometimes increment those pointers, and read and write data from and to the resulting address. To that end, assigning and integer to a pointer and manipulating that pointer using arithmetic is well defined by the language. By making it implementation-defined, what the language allows is that all kinds of things can happen: from the classic halt-and-catch-fire to raising a bus error when trying to dereference an odd address.

The difference between undefined behaviour and implementation-defined behaviour is basically undefined behaviour means "don't do that, we don't know what will happen" and implementation-defined behaviour means "it's OK to go ahead and do that, it's up to you to know what will happen."

Solution 4 - C

It is undefined behavior.

From N1570 (emphasis added):

> An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

If the value is a trap representation, reading it is undefined behavior:

> Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.) Such a representation is called a trap representation.

And

> An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator).

Therefore, the line void *ptr = (char *)0x01; is already potentially undefined behavior, on an implementation where (char*)0x01 or (void*)(char*)0x01 is a trap representation. The left-hand side is an lvalue expression that does not have character type and reads a trap representation.

On some hardware, loading an invalid pointer into a machine register could crash the program, so this was a forced move by the standards committee.

Solution 5 - C

The Standard does not require that implementations process integer-to-pointer conversions in a meaningful fashion for any particular integer values, or even for any possible integer values other than Null Pointer Constants. The only thing it guarantees about such conversions is that a program which stores the result of such a conversion directly into an object of suitable pointer type and does nothing with it except examine the bytes of that object will, at worst, see Unspecified values. While the behavior of converting an integer to a pointer is Implementation-Defined, nothing would forbid any implementation (no matter what it actually does with such conversions!) from specifying that some (or even all) of the bytes of the representation having Unspecified values, and specifying that some (or even all) integer values may behave as though they yield trap representations.

The only reasons the Standard says anything at all about integer-to-pointer conversions are that:

  1. In some implementations, the construct is meaningful, and some programs for those implementations require it.

  2. The authors of the Standard did not like the idea of a construct that was used on some implementations would represent a constraint violation on others.

  3. It would have been odd for the Standard to describe a construct but then specify that it has Undefined Behavior in all cases.

Personally, I think the Standard should have allowed implementations to treat integer-to-pointer conversions as constraint violations if they don't define any situations where they would be useful, rather than require that compilers accept the meaningless code, but that wasn't the philosophy at the time.

I think it would be simplest to simply say that any operation involving integer-to-pointer conversions with anything other than intptr_t or uintptr_t values received from pointer-to-integer conversions invokes Undefined Behavior, but then note that it is common for quality implementations intended for low-level programming to process Undefined Behavior "in a documented manner characteristic of the environment". The Standard doesn't specify when implementations should process programs that invoke UB in that fashion but instead treats it as a Quality of Implementation issue.

If an implementation specifies that integer-to-pointer conversions operate in a fashion that would define the behavior of

char *p = (char*)1;
p++;

as equivalent to "char p = (char)2;", then the implementation should be expected to work that way. On the other hand, an implementation could define the behavior of integer-to-pointer conversion in such a way that even:

char *p = (char*)1;
char *q = p;  // Not doing any arithmetic here--just a simple assignment

would release nasal demons. On most platforms, a compiler where arithmetic on pointers produced by integer-to-pointer conversions behaved oddly would not be viewed as a high-quality implementation suitable for low-level programming. A programmer that is not intending to target any other kind of implementations could thus expect such constructs to behave usefully on compilers for which the code was suitable, even though the Standard does not require it.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDavid RanieriView Question on Stackoverflow
Solution 1 - CdbushView Answer on Stackoverflow
Solution 2 - CBathshebaView Answer on Stackoverflow
Solution 3 - CStephen M. WebbView Answer on Stackoverflow
Solution 4 - CDavislorView Answer on Stackoverflow
Solution 5 - CsupercatView Answer on Stackoverflow