Is a C compiler allowed to coalesce sequential assignments to volatile variables?

CLanguage LawyerCompiler OptimizationVolatile

C Problem Overview


I'm having a theoretical (non-deterministic, hard to test, never happened in practice) hardware issue reported by hardware vendor where double-word write to certain memory ranges may corrupt any future bus transfers.

While I don't have any double-word writes explicitly in C code, I'm worried the compiler is allowed (in current or future implementations) to coalesce multiple adjacent word assignments into a single double-word assignment.

The compiler is not allowed to reorder assignments of volatiles, but it is unclear (to me) whether coalescing counts as reordering. My gut says it is, but I've been corrected by language lawyers before!

Example:

typedef struct
{
   volatile unsigned reg0;
   volatile unsigned reg1;
} Module;

volatile Module* module = (volatile Module*)0xFF000000u;

// two word stores, or one double-word store?
module->reg0 = 1;
module->reg1 = 2;

(I'll ask my compiler vendor about this separately, but I'm curious what the canonical/community interpretation of the standard is.)

C Solutions


Solution 1 - C

No, the compiler is absolutely not allowed to optimize those two writes into a single double word write. It's kind of hard to quote the standard since the part regarding optimizations and side effects is so fuzzily written. The relevant parts are found in C17 5.1.2.3:

> The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.

> Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

> In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

> Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.

When you access part of a struct, that in itself is a side-effect, which may have consequences that the compiler can't determine. Suppose for example that your struct is a hardware register map and those registers need to be written in a certain order. Like for example some microcontroller documentation could be along the lines of: "reg0 enables the hardware peripheral and must be written to before you can configure the details in reg1".

A compiler that would merge the volatile object writes into a single one would be non-conforming and plain broken.

Solution 2 - C

The compiler is not allowed to make two such assignments into a single memory write. There must be two independent writes from the core. The answer from @Lundin gives relevant references to the C standard.

However, be aware that a cache - if present - may trick you. The keyword volatile doesn't imply "uncached" memory. So besides using volatile, you also need to make sure that the address 0xFF000000 is mapped as uncached. If the address is mapped as cached, the cache HW may turn the two assignments into a single memory write. In other words - for cached memory two core memory write operations may end up as a single write operation on the systems memory interface.

Solution 3 - C

The behavior of volatile seems to be up to the implementation, partly because of a curious sentence which says: "What constitutes an access to an object that has volatile-qualified type is implementation-defined".

In ISO C 99, section 5.1.2.3, there is also:

> 3 In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

So although requirements are given that a volatile object must be treated in accordance with the abstract semantics (i.e not optimized), curiously, the abstract semantics itself allows for the elimination of dead code and data flows, which are examples of optimizations!

I'm afraid that to know what volatile will and will not do, you have to go by your compiler's documentation.

Solution 4 - C

The C Standard is agnostic to any relationship between operations on volatile objects and operations on the actual machine. While most implementations would specify that a construct like *(char volatile*)0x1234 = 0x56; would generate a byte store with value 0x56 to hardware address 0x1234, an implementation could, at its leisure, allocate space for e.g. an 8192-byte array and specify that *(char volatile*)0x1234 = 0x56; would immediately store 0x56 to element 0x1234 of that array, without ever doing anything with hardware address 0x1234. Alternatively, an implementation may include some process that periodically stores whatever happens to be in 0x1234 of that array to hardware address 0x56.

All that is required for conformance is that all operations on volatile objects within a single thread are, from the standpoint of the Abstract machine, regarded as absolutely sequenced. From the point of view of the Standard, implementations can convert such accesses into real machine operations in whatever fashion they see fit.

Solution 5 - C

Changing it will change the observable behavior of the program. So compiler is not allowed to do so.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAndreasView Question on Stackoverflow
Solution 1 - CLundinView Answer on Stackoverflow
Solution 2 - CSupport UkraineView Answer on Stackoverflow
Solution 3 - CKazView Answer on Stackoverflow
Solution 4 - CsupercatView Answer on Stackoverflow
Solution 5 - C0___________View Answer on Stackoverflow