Is function call an effective memory barrier for modern platforms?

C Problem Overview

In a codebase I reviewed, I found the following idiom.

void notify(struct actor_t act) {
    write(act.pipe, "M", 1);
}
// thread A sending data to thread B
void send(byte *data) {
    global.data = data;
    notify(threadB);
}
// in thread B event loop
read(this.sock, &cmd, 1);
switch (cmd) {
    case 'M': use_data(global.data);break;
    ...
}

"Hold it", I said to the author, a senior member of my team, "there's no memory barrier here! You don't guarantee that global.data will be flushed from the cache to main memory. If thread A and thread B will run in two different processors - this scheme might fail".

The senior programmer grinned, and explained slowly, as if explaining his five years old boy how to tie his shoelaces: "Listen young boy, we've seen here many thread related bugs, in high load testing, and in real clients", he paused to scratch his longish beard, "but we've never had a bug with this idiom".

"But, it says in the book..."

"Quiet!", he hushed me promptly, "Maybe theoretically, it's not guaranteed, but in practice, the fact you used a function call is effectively a memory barrier. The compiler will not reorder the instruction global.data = data, since it can't know if anyone using it in the function call, and the x86 architecture will ensure that the other CPUs will see this piece of global data by the time thread B reads the command from the pipe. Rest assured, we have ample real world problems to worry about. We don't need to invest extra effort in bogus theoretical problems.

"Rest assured my boy, in time you'll understand to separate the real problem from the I-need-to-get-a-PhD non-problems."

Is he correct? Is that really a non-issue in practice (say x86, x64 and ARM)?

It's against everything I learned, but he does have a long beard and a really smart looks!

Extra points if you can show me a piece of code proving him wrong!

C Solutions

Solution 1 - C

Memory barriers aren't just to prevent instruction reordering. Even if instructions aren't reordered it can still cause problems with cache coherence. As for the reordering - it depends on your compiler and settings. ICC is particularly agressive with reordering. MSVC w/ whole program optimization can be, too.

If your shared data variable is declared as volatile, even though it's not in the spec most compilers will generate a memory variable around reads and writes from the variable and prevent reordering. This is not the correct way of using volatile, nor what it was meant for.

(If I had any votes left, I'd +1 your question for the narration.)

Solution 2 - C

In practice, a function call is a compiler barrier, meaning that the compiler will not move global memory accesses past the call. A caveat to this is functions which the compiler knows something about, e.g. builtins, inlined functions (keep in mind IPO!) etc.

So a processor memory barrier (in addition to a compiler barrier) is in theory needed to make this work. However, since you're calling read and write which are syscalls that change the global state, I'm quite sure that the kernel issues memory barriers somewhere in the implementation of those. There is no such guarantee though, so in theory you need the barriers.

Solution 3 - C

The basic rule is: the compiler must make the global state appear to be exactly as you coded it, but if it can prove that a given function doesn't use global variables then it can implement the algorithm any way it chooses.

The upshot is that traditional compilers always treated functions in another compilation unit as a memory barrier because they couldn't see inside those functions. Increasingly, modern compilers are growing "whole program" or "link time" optimization strategies which break down these barriers and will cause poorly written code to fail, even though it's been working fine for years.

If the function in question is in a shared library then it won't be able to see inside it, but if the function is one defined by the C standard then it doesn't need to -- it already knows what the function does -- so you have to be careful of those also. Note that a compiler will not recognise a kernel call for what it is, but the very act of inserting something that the compiler can't recognise (inline assembler, or a function call to an assembler file) will create a memory barrier in itself.

In your case, notify will either be a black box the compiler can't see inside (a library function) or else it will contain a recognisable memory barrier, so you are most likely safe.

In practice, you have to write very bad code to fall over this.

Solution 4 - C

In practice, he's correct and a memory barrier is implied in this specific case.

But the point is that if its presence is "debatable", the code is already too complex and unclear.

Really guys, use a mutex or other proper constructs. It's the only safe way to deal with threads and to write maintainable code.

And maybe you'll see other errors, like that the code is unpredictable if send() is called more than one time.

Content Type	Original Author	Original Content on Stackoverflow
Question	mikebloch	View Question on Stackoverflow
Solution 1 - C	Mahmoud Al-Qudsi	View Answer on Stackoverflow
Solution 2 - C	janneb	View Answer on Stackoverflow
Solution 3 - C	ams	View Answer on Stackoverflow
Solution 4 - C	amadvance	View Answer on Stackoverflow