What does `std::kill_dependency` do, and why would I want to use it?

C++MultithreadingMemory ModelC++11

C++ Problem Overview


I've been reading about the new C++11 memory model and I've come upon the std::kill_dependency function (§29.3/14-15). I'm struggling to understand why I would ever want to use it.

I found an example in the N2664 proposal but it didn't help much.

It starts by showing code without std::kill_dependency. Here, the first line carries a dependency into the second, which carries a dependency into the indexing operation, and then carries a dependency into the do_something_with function.

r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[r2]);

There is further example that uses std::kill_dependency to break the dependency between the second line and the indexing.

r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[std::kill_dependency(r2)]);

As far as I can tell, this means that the indexing and the call to do_something_with are not dependency ordered before the second line. According to N2664:

> This allows the compiler to reorder the call to do_something_with, for example, by performing speculative optimizations that predict the value of a[r2].

In order to make the call to do_something_with the value a[r2] is needed. If, hypothetically, the compiler "knows" that the array is filled with zeros, it can optimize that call to do_something_with(0); and reorder this call relative to the other two instructions as it pleases. It could produce any of:

// 1
r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(0);
// 2
r1 = x.load(memory_order_consume);
do_something_with(0);
r2 = r1->index;
// 3
do_something_with(0);
r1 = x.load(memory_order_consume);
r2 = r1->index;

Is my understanding correct?

If do_something_with synchronizes with another thread by some other means, what does this mean with respect to the ordering of the x.load call and this other thread?

Assuming my understading is correct, there's still one thing that bugs me: when I'm writing code, what reasons would lead me to choose to kill a dependency?

C++ Solutions


Solution 1 - C++

The purpose of memory_order_consume is to ensure the compiler does not do certain unfortunate optimizations that may break lockless algorithms. For example, consider this code:

int t;
volatile int a, b;

t = *x;
a = t;
b = t;

A conforming compiler may transform this into:

a = *x;
b = *x;

Thus, a may not equal b. It may also do:

t2 = *x;
// use t2 somewhere
// later
t = *x;
a = t2;
b = t;

By using load(memory_order_consume), we require that uses of the value being loaded not be moved prior to the point of use. In other words,

t = x.load(memory_order_consume);
a = t;
b = t;
assert(a == b); // always true

The standard document considers a case where you may only be interested in ordering certain fields of a structure. The example is:

r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[std::kill_dependency(r2)]);

This instructs the compiler that it is allowed to, effectively, do this:

predicted_r2 = x->index; // unordered load
r1 = x; // ordered load
r2 = r1->index;
do_something_with(a[predicted_r2]); // may be faster than waiting for r2's value to be available

Or even this:

predicted_r2 = x->index; // unordered load
predicted_a  = a[predicted_r2]; // get the CPU loading it early on
r1 = x; // ordered load
r2 = r1->index; // ordered load
do_something_with(predicted_a);

If the compiler knows that do_something_with won't change the result of the loads for r1 or r2, then it can even hoist it all the way up:

do_something_with(a[x->index]); // completely unordered
r1 = x; // ordered
r2 = r1->index; // ordered

This allows the compiler a little more freedom in its optimization.

Solution 2 - C++

In addition to the other answer, I will point out that Scott Meyers, one of the definitive leaders in the C++ community, bashed memory_order_consume pretty strongly. He basically said that he believed it had no place in the standard. He said there are two cases where memory_order_consume has any effect:

  • Exotic architectures designed to support 1024+ core shared memory machines.
  • The DEC Alpha

Yes, once again, the DEC Alpha finds its way into infamy by using an optimization not seen in any other chip until many years later on absurdly specialized machines.

The particular optimization is that those processors allow one to dereference a field before actually getting the address of that field (i.e. it can look up x->y BEFORE it even looks up x, using a predicted value of x). It then goes back and determines whether x was the value it expected it to be. On success, it saved time. On failure, it has to go back and get x->y again.

Memory_order_consume tells the compiler/architecture that these operations have to happen in order. However, in the most useful case, one will end up wanting to do (x->y.z), where z doesn't change. memory_order_consume would force the compiler to keep x y and z in order. kill_dependency(x->y).z tells the compiler/architecture that it may resume doing such nefarious reorderings.

99.999% of developers will probably never work on a platform where this feature is required (or has any effect at all).

Solution 3 - C++

The usual use case of kill_dependency arises from the following. Suppose you want to do atomic updates to a nontrivial shared data structure. A typical way to do this is to nonatomically create some new data and to atomically swing a pointer from the data structure to the new data. Once you do this, you are not going to change the new data until you have swung the pointer away from it to something else (and waited for all readers to vacate). This paradigm is widely used, e.g. read-copy-update in the Linux kernel.

Now, suppose the reader reads the pointer, reads the new data, and comes back later and reads the pointer again, finding that the pointer hasn't changed. The hardware can't tell that the pointer hasn't been updated again, so by consume semantics he can't use a cached copy of the data but has to read it again from memory. (Or to think of it another way, the hardware and compiler can't speculatively move the read of the data up before the read of the pointer.)

This is where kill_dependency comes to the rescue. By wrapping the pointer in a kill_dependency, you create a value that will no longer propagate dependency, allowing accesses through the pointer to use the cached copy of the new data.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionR. Martinho FernandesView Question on Stackoverflow
Solution 1 - C++bdonlanView Answer on Stackoverflow
Solution 2 - C++Cort AmmonView Answer on Stackoverflow
Solution 3 - C++user2949652View Answer on Stackoverflow