Why can't the C compiler optimize changing the value of a const pointer assuming that two pointers to the same variable would be illegal/UB?
CRustUndefined BehaviorStrict AliasingC Problem Overview
Recently I stumbled over a comparison between Rust and C and they use the following code:
bool f(int* a, const int* b) {
*a = 2;
int ret = *b;
*a = 3;
return ret != 0;
}
In Rust (same code, but with Rust syntax), it produces the following Assembler Code:
cmp dword ptr [rsi], 0
mov dword ptr [rdi], 3
setne al
ret
While with gcc it produces the following:
mov DWORD PTR [rdi], 2
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], 3
test eax, eax
setne al
ret
The text claims that the C function can't optimize the first line away, because a
and b
could point to the same number. In Rust this is not allowed so the compiler can optimize it away.
Now to my question:
The function takes a const int*
which is a pointer to a const int. I read this question and it states that modifying a const int with a pointer should result in a compiler warning and in the worst cast in UB.
Could this function result in a UB if I call it with two pointers to the same integer?
Why can't the C compiler optimize the first line away, under the assumption, that two pointers to the same variable would be illegal/UB?
C Solutions
Solution 1 - C
>Why can't the C Compiler optimize the first line away, under the assumption, that two pointers to the same variable would be illegal/UB?
Because you haven't instructed the C compiler to do so -- that it is allowed to make that assumption.
C has a type qualifier for exactly this called restrict
which roughly means: this pointer does not overlap with other pointers (not exactly, but play along).
The assembly output for
bool f(int* restrict a, const int* b) {
*a = 2;
int ret = *b;
*a = 3;
return ret != 0;
}
is
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], 3
test eax, eax
setne al
ret
... which removes/optimizes-away the assignment *a = 2
From https://en.wikipedia.org/wiki/Restrict
>In the C programming language, restrict is a keyword that can be used in pointer declarations. By adding this type qualifier, a programmer hints to the compiler that for the lifetime of the pointer, only the pointer itself or a value directly derived from it (such as pointer + 1) will be used to access the object to which it points.
Solution 2 - C
The function int f(int *a, const int *b);
promises to not change the contents of b
through that pointer... It makes no promises regarding access to variables through the a
pointer.
If a
and b
point to the same object, changing it through a
is legal (provided the underlying object is modifiable, of course).
Example:
int val = 0;
f(&val, &val);
Solution 3 - C
While the other answers mention the C side, it is still worth taking a look at the Rust side. With Rust the code you have is probably this:
fn f(a:&mut i32, b:&i32)->bool{
*a = 2;
let ret = *b;
*a = 3;
return ret != 0;
}
The function takes in two references, one mutable, one not. References are pointers that are guaranteed to be valid for reads, and mutable references are also guaranteed to be unique, so it gets optimized to
cmp dword ptr [rsi], 0
mov dword ptr [rdi], 3
setne al
ret
However, Rust also has raw pointers that are equivalent to C's pointers and make no such guarantees. The following function, which takes in raw pointers:
unsafe fn g(a:*mut i32, b:*const i32)->bool{
*a = 2;
let ret = *b;
*a = 3;
return ret != 0;
}
misses out on the optimization and compiles to this:
mov dword ptr [rdi], 2
cmp dword ptr [rsi], 0
mov dword ptr [rdi], 3
setne al
ret
Solution 4 - C
>The function takes a const int*
which is a pointer to a const int.
No, const int*
is not a pointer to a const int. Anyone who says that is deluded.
-
int*
is a pointer to an int that definitely isn't const. -
const int*
is a pointer to an int of unknown constness. -
There is no way to express the notion of a pointer to an int that definitely is const.
If C was a better designed language, then const int *
would be a pointer to a const int, mutable int *
(borrowing a keyword from C++) would be a pointer to a non-const int, and int *
would be a pointer to an int of unknown constness. Dropping the qualifiers (i.e., forgetting something about the pointed-to type) would be safe – the opposite of real C in which adding the const
qualifier is safe. I haven't used Rust, but it appears from examples in another answer that it uses a syntax like that.
Bjarne Stroustrup, who introduced const
, originally named it readonly
, which is much closer to its actual meaning. int readonly*
would have made it clearer that it's the pointer that's read-only, not the pointed-to object. The renaming to const
has confused generations of programmers.
When I have the choice, I always write foo const*
, not const foo*
, as the next best thing to readonly*
.
Solution 5 - C
It should be noted that this question is talking about optimisation on -Ofast
and how it's even the case there.
Essentially, the C compiler of the function does not know the full discrete set of addresses that might be passed to it as that isn't known until link time / runtime as the function can be called from multiple translation units, and therefore it makes considerations that handle any legal address that a
and b
might point to, and of course that includes the case where they overlap.
Therefore, you need to use restrict
to tell it that updating a
(which the function allows because it is not a pointer-to-const, but even then the function could cast off const) doesn't update the value b
is pointing to, which needs to be included in the comparison to 0, hence the store to a
that happens before the comparison needs to go ahead, whereas on rust the default assumption is restrict. The compiler of the function does however know that *a
is the same as *(a+1-1)
and therefore will not produce 2 separate stores, but it does not know whether a
or b
overlap.