Why don't C++ compilers optimize this conditional boolean assignment as an unconditional assignment?

C++Optimization

C++ Problem Overview


Consider the following function:

void func(bool& flag)
{
    if(!flag) flag=true;
}

It seems to me that if flag has a valid boolean value, this would be equivalent to unconditional setting it to true, like this:

void func(bool& flag)
{
    flag=true;
}

Yet neither gcc nor clang optimize it this way — both generate the following at -O3 optimization level:

_Z4funcRb:
.LFB0:
	.cfi_startproc
	cmp	BYTE PTR [rdi], 0
	jne	.L1
	mov	BYTE PTR [rdi], 1
.L1:
	rep ret

My question is: is it just that the code is too special-case to care to optimize, or are there any good reasons why such optimization would be undesired, given that flag is not a reference to volatile? It seems the only reason which might be is that flag could somehow have a non-true-or-false value without undefined behavior at the point of reading it, but I'm not sure whether this is possible.

C++ Solutions


Solution 1 - C++

This may negatively impact the performance of the program due to cache coherence considerations. Writing to flag each time func() is called would dirty the containing cache line. This will happen regardless of the fact that the value being written exactly matches the bits found at the destination address before the write.


EDIT

hvd has provided another good reason that prevents such an optimization. It is a more compelling argument against the proposed optimization, since it may result in undefined behavior, whereas my (original) answer only addressed performance aspects.

After a little more reflection, I can propose one more example why compilers should be strongly banned - unless they can prove that the transformation is safe for a particular context - from introducing the unconditional write. Consider this code:

const bool foo = true;

int main()
{
    func(const_cast<bool&>(foo));
}

With an unconditional write in func() this definitely triggers undefined behavior (writing to read-only memory will terminate the program, even if the effect of the write would otherwise be a no-op).

Solution 2 - C++

Aside from Leon's answer on performance:

Suppose flag is true. Suppose two threads are constantly calling func(flag). The function as written, in that case, does not store anything to flag, so this should be thread-safe. Two threads do access the same memory, but only to read it. Unconditionally setting flag to true means two different threads would be writing to the same memory. This is not safe, this is unsafe even if the data being written is identical to the data that's already there.

Solution 3 - C++

I am not sure about the behaviour of C++ here, but in C the memory might change because if the memory contains a non-zero value other than 1, it would remain unchanged with the check, but changed to 1 with the check.

But as I am not very fluent in C++, I don't know if this situation is even possible.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRuslanView Question on Stackoverflow
Solution 1 - C++LeonView Answer on Stackoverflow
Solution 2 - C++user743382View Answer on Stackoverflow
Solution 3 - C++glglglView Answer on Stackoverflow