Is a pointer with the right address and type still always a valid pointer since C++17?

C++PointersC++14Language LawyerC++17

C++ Problem Overview


(In reference to this question and answer.)

Before the C++17 standard, the following sentence was included in [basic.compound]/3:

> If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

But since C++17, this sentence has been removed.

For example I believe that this sentence made this example code defined, and that since C++17 this is undefined behavior:

 alignas(int) unsigned char buffer[2*sizeof(int)];
 auto p1=new(buffer) int{};
 auto p2=new(p1+1) int{};
 *(p1+1)=10;

Before C++17, p1+1 holds the address to *p2 and has the right type, so *(p1+1) is a pointer to *p2. In C++17 p1+1 is a pointer past-the-end, so it is not a pointer to object and I believe it is not dereferencable.

Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of the cited sentence?

C++ Solutions


Solution 1 - C++

> Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of this cited sentence?

Yes, this interpretation is correct. A pointer past the end isn't simply convertible to another pointer value that happens to point to that address.

The new [basic.compound]/3 says:

> Every value of pointer type is one of the following:
(3.1) a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) a pointer past the end of an object ([expr.add]), or

Those are mutually exclusive. p1+1 is a pointer past the end, not a pointer to an object. p1+1 points to a hypothetical x[1] of a size-1 array at p1, not to p2. Those two objects are not pointer-interconvertible.

We also have the non-normative note:

> [ Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address. [...]

which clarifies the intent.


As T.C. points out in numerous comments (notably this one), this is really a special case of the problem that comes with trying to implement std::vector - which is that [v.data(), v.data() + v.size()) needs to be a valid range and yet vector doesn't create an array object, so the only defined pointer arithmetic would be going from any given object in the vector to past-the-end of its hypothetical one-size array. Fore more resources, see CWG 2182, this std discussion, and two revisions of a paper on the subject: P0593R0 and P0593R1 (section 1.3 specifically).

Solution 2 - C++

In your example, *(p1 + 1) = 10; should be UB, because it is one past the end of the array of size 1. But we are in a very special case here, because the array was dynamically constructed in a larger char array.

Dynamic object creation is described in 4.5 The C++ object model [intro.object], §3 of the n4659 draft of the C++ standard:

> 3 If a complete object is created (8.3.4) in storage associated with another object e of type “array of N unsigned char” or of type “array of N std::byte” (21.2.1), that array provides storage for the created object if:
(3.1) — the lifetime of e has begun and not ended, and
(3.2) — the storage for the new object fits entirely within e, and
(3.3) — there is no smaller array object that satisfies these constraints.

The 3.3 seems rather unclear, but the examples below make the intent more clear:

> struct A { unsigned char a[32]; }; struct B { unsigned char b[16]; }; A a; B *b = new (a.a + 8) B; // a.a provides storage for *b int *p = new (b->b + 4) int; // b->b provides storage for *p // a.a does not provide storage for *p (directly), // but *p is nested within a (see below)

So in the example, the buffer array provides storage for both *p1 and *p2.

The following paragraphs prove that the complete object for both *p1 and *p2 is buffer:

> 4 An object a is nested within another object b if:
(4.1) — a is a subobject of b, or
(4.2) — b provides storage for a, or
(4.3) — there exists an object c where a is nested within c, and c is nested within b.

> 5 For every object x, there is some object called the complete object of x, determined as follows:
(5.1) — If x is a complete object, then the complete object of x is itself.
(5.2) — Otherwise, the complete object of x is the complete object of the (unique) object that contains x.

Once this is established, the other relevant part of draft n4659 for C++17 is [basic.coumpound] §3(emphasize mine):

> 3 ... Every value of pointer type is one of the following:
(3.1) — a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) — a pointer past the end of an object (8.7), or
(3.3) — the null pointer value (7.11) for that type, or
(3.4) — an invalid pointer value.

> A value of a pointer type that is a pointer to or past the end of an object represents the address of the first byte in memory (4.4) occupied by the object or the first byte in memory after the end of the storage occupied by the object, respectively. [ Note: A pointer past the end of an object (8.7) is not considered to point to an unrelated object of the object’s type that might be located at that address. A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration; see 6.7. —end note ] For purposes of pointer arithmetic (8.7) and comparison (8.9, 8.10), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n]. The value representation of pointer types is implementation-defined. Pointers to layout-compatible types shall have the same value representation and alignment requirements (6.11)...

The note A pointer past the end... does not apply here because the objects pointed to by p1 and p2 and not unrelated, but are nested into the same complete object, so pointer arithmetics make sense inside the object that provide storage: p2 - p1 is defined and is (&buffer[sizeof(int)] - buffer]) / sizeof(int) that is 1.

So p1 + 1 is a pointer to *p2, and *(p1 + 1) = 10; has defined behaviour and sets the value of *p2.


I have also read the C4 annex on the compatibility between C++14 and current (C++17) standards. Removing the possibility to use pointer arithmetics between objects dynamically created in a single character array would be an important change that IMHO should be cited there, because it is a commonly used feature. As nothing about it exist in the compatibility pages, I think that it confirms that it was not the intent of the standard to forbid it.

In particular, it would defeat that common dynamic construction of an array of objects from a class with no default constructor:

class T {
    ...
    public T(U initialization) {
        ...
    }
};
...
unsigned char *mem = new unsigned char[N * sizeof(T)];
T * arr = reinterpret_cast<T*>(mem); // See the array as an array of N T
for (i=0; i<N; i++) {
    U u(...);
    new(arr + i) T(u);
}

arr can then be used as a pointer to the first element of an array...

Solution 3 - C++

To expand on the answers given here is an example of what I believe the revised wording is excluding:

Warning: Undefined Behaviour

#include <iostream>
int main() {
	int A[1]{7};
	int B[1]{10};
    bool same{(B)==(A+1)};
                    
    std::cout<<B<< ' '<< A <<' '<<sizeof(*A)<<'\n';
    std::cout<<(same?"same":"not same")<<'\n';
    std::cout<<*(A+1)<<'\n';//!!!!!  
    return 0;
}

For entirely implementation dependent (and fragile) reasons possible output of this program is:

0x7fff1e4f2a64 0x7fff1e4f2a60 4
same
10

That output shows that the two arrays (in that case) happen to be stored in memory such that 'one past the end' of A happens to hold the value of the address of the first element of B.

The revised specification is ensuring that regardless A+1 is never a valid pointer to B. The old phrase 'regardless of how the value is obtained' says that if 'A+1' happens to point to 'B[0]' then it's a valid pointer to 'B[0]'. That can't be good and surely never the intention.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionOlivView Question on Stackoverflow
Solution 1 - C++BarryView Answer on Stackoverflow
Solution 2 - C++Serge BallestaView Answer on Stackoverflow
Solution 3 - C++PersixtyView Answer on Stackoverflow