Is if(A | B) always faster than if(A || B)?

C++OptimizationBenchmarkingBranch Prediction

C++ Problem Overview


I am reading this book by Fedor Pikus and he has some very very interesting examples which for me were a surprise.
Particularly this benchmark caught me, where the only difference is that in one of them we use || in if and in another we use |.

void BM_misspredict(benchmark::State& state)
{

	std::srand(1);
	const unsigned int N = 10000;;
	std::vector<unsigned long> v1(N), v2(N);
	std::vector<int> c1(N), c2(N);

	for (int i = 0; i < N; ++i) 
	{
		v1[i] = rand();
		v2[i] = rand();
		c1[i] = rand() & 0x1;
		c2[i] = !c1[i];
	}

	unsigned long* p1 = v1.data();
	unsigned long* p2 = v2.data();
	int* b1 = c1.data();
	int* b2 = c2.data();

	for (auto _ : state)
	{
		unsigned long a1 = 0, a2 = 0;
		for (size_t i = 0; i < N; ++i) 
		{
			if (b1[i] || b2[i])  // Only difference
			{
				a1 += p1[i];
			}
			else 
			{
				a2 *= p2[i];
			}
		}
		benchmark::DoNotOptimize(a1);
		benchmark::DoNotOptimize(a2);
		benchmark::ClobberMemory();

	}
	state.SetItemsProcessed(state.iterations());
}

void BM_predict(benchmark::State& state)
{

	std::srand(1);
	const unsigned int N = 10000;;
	std::vector<unsigned long> v1(N), v2(N);
	std::vector<int> c1(N), c2(N);

	for (int i = 0; i < N; ++i)
	{
		v1[i] = rand();
		v2[i] = rand();
		c1[i] = rand() & 0x1;
		c2[i] = !c1[i];
	}

	unsigned long* p1 = v1.data();
	unsigned long* p2 = v2.data();
	int* b1 = c1.data();
	int* b2 = c2.data();

	for (auto _ : state)
	{
		unsigned long a1 = 0, a2 = 0;
		for (size_t i = 0; i < N; ++i)
		{
			if (b1[i] | b2[i]) // Only difference
			{
				a1 += p1[i];
			}
			else
			{
				a2 *= p2[i];
			}
		}
		benchmark::DoNotOptimize(a1);
		benchmark::DoNotOptimize(a2);
		benchmark::ClobberMemory();

	}
	state.SetItemsProcessed(state.iterations());
}

I will not go in all the details explained in the book why the latter is faster, but the idea is that hardware branch predictor is given 2 chances to misspredict in slower version and in the | (bitwise or) version. See the benchmark results below.

enter image description here

So the question is why don't we always use | instead of || in branches?

C++ Solutions


Solution 1 - C++

> Is if(A | B) always faster than if(A || B)?

No, if(A | B) is not always faster than if(A || B).

Consider a case where A is true and the B expression is a very expensive operation. Not doing the operation can save that expense.

> So the question is why don't we always use | instead of || in branches?

Besides the cases where the logical or is more efficient, the efficiency is not the only concern. There are often operations that have pre-conditions, and there are cases where the result of the left hand operation signals whether the pre-condition is satisfied for the right hand operation. In such case, we must use the logical operator.

if (b1[i])  // maybe this exists somewhere in the program
    b2 = nullptr;

if(b1[i] || b2[i]) // OK
if(b1[i]  | b2[i]) // NOT OK; indirection through null pointer

It is this possibility that is typically the problem when the optimiser is unable to replace logical with bitwise. In the example of if(b1[i] || b2[i]), the optimiser can do such replacement only if it can prove that b2 is valid at least when b1[i] != 0. That condition might not exist in your example, but that doesn't mean that it would necessarily be easy or - sometimes even possible - for the optimiser to prove that it doesn't exist.


Furthermore, there can be a dependency on the order of the operations, for example if one operand modifies a value read by the other operation:

if(a || (a = b)) // OK
if(a  | (a = b)) // NOT OK; undefined behaviour

Also, there are types that are convertible to bool and thus are valid operands for ||, but are not valid operators for |:

if(ptr1 || ptr2) // OK
if(ptr1  | ptr2) // NOT OK; no bitwise or for pointers

TL;DR If we could always use bitwise or instead of logical operators, then there would be no need for logical operators and they probably wouldn't be in the language. But such replacement is not always a possibility, which is the reason why we use logical operators, and also the reason why optimiser sometimes cannot use the faster option.

Solution 2 - C++

If evaluating A is fast, B is slow, and when the short circuit happens (A returns true), then if (A || B) will avoid the slow path where if (A | B) will not.

If evaluating A almost always gives the same result, the processor's branch prediction may give if (A || B) performance better than if (A | B) even if B is fast.

As others have mentioned, there are cases where the short circuit is mandatory: you only want to execute B if A is known to evaluate false:

if (p == NULL || test(*p)) { ... }  // null pointer would crash on *p

if (should_skip() || try_update()) { ... }  // active use of side effects

Solution 3 - C++

Bitwise-or is a branchless arithmetic operator corresponding to a single ALU instruction. Logical-or is defined as implying shortcut evaluation, which involves a (costly) conditional branch. The effect of the two can differ when the evaluations of the operands have side effects.

In the case of two boolean variables, a smart compiler might evaluate logical-or as a bitwise-or, or using a conditional move, but who knows...

Solution 4 - C++

> So the question is why don't we always use | instead of || in branches?

  • Branch prediction is relevant only to fairly hot pieces of code, and it depends on the branch being predictable enough to matter. In most places, | has little or no performance benefit over ||.

Also, taking A and B as expressions of suitable type (not necessarily single variables), key relevant differences include:

  • In A || B, B is evaluated only if A evaluates to 0, but in A | B, B is always evaluated. Conditionally avoiding evaluation of B is sometimes exactly the point of using the former.

  • In A || B there is a sequence point between evaluation of A and evaluation of B, but there isn't one in A | B. Even if you don't care about short-circuiting, you may care about the sequencing, even in relatively simple examples. For example, given an integer x, x-- || x-- has well-defined behavior, but x-- | x-- has undefined behavior.

  • When used in conditional context, the intent of A || B is clear to other humans, but the reason to substitute A | B less so. Code clarity is extremely important. And after all, if the compiler can see that it is safe to do so (and in most cases it is more reliable than a human at making the determination) then it is at liberty to compile one of those expressions as if it were the other.

  • If you cannot be sure that A and B both have built-in types -- in a template, for example -- you have to account for the possibility that one or both of | and || have been overloaded. In that case, it is reasonable to suppose that || will still do something that makes sense for branch control, but it is much less safe to assume that | will do something equivalent or even suitable.

As an additional minor matter, the precedence of | is different from that of ||. This can bite you if you rely on precedence instead of parentheses for grouping, and you need to watch out for that if you are considering modifying existing code to change || expressions to | expressions. For example, A && B || C && D groups as (A && B) || (C && D), but A && B | C && D groups as (A && (B | C)) && D.

Solution 5 - C++

Even if a and b are automatic-duration Boolean flags, that doesn't mean that an expression like a||b will be evaluated by checking the state of one flag, and then if necessary checking the state of the other. If a section of code performs:

x = y+z;
flag1 = (x==0);
... code that uses flag1

a compiler could replace that with:

x = y+z;
if (processor's Z flag was set)
{
... copy of that uses flag1, but where flag is replaced with constant 1
}
else
{
... copy of that uses flag1, but where flag is replaced with constant 0
}

Although hardly required to do so, a compiler may base some of its decisions about whether to perform such substitution upon a programmer's choice of whether ot write (flag1 || flag2) or (flag1 | flag2), and many factors may cause the aforementioned substitution to run faster or slower than the original code.

Solution 6 - C++

Code readability, short-circuiting and it is not guaranteed that Ord will always outperform a || operand. Computer systems are more complicated than expected, even though they are man-made.

There was a case where a for loop with a much more complicated condition ran faster on an IBM. The CPU didn't cool and thus instructions were executed faster, that was a possible reason. What I am trying to say, focus on other areas to improve code than fighting small-cases which will differ depending on the CPU and the boolean evaluation (compiler optimizations).

Solution 7 - C++

The expression A | B might be faster in a loop that the compiler can optimize to a bitwise or of two vectors. Another case where | might be slightly more optimized is when the compiler would want to optimize away a branch by combining the two operands with bitmasks. If the operand on the right is something that might have visible side-effects, the compiler must instead insert a branch to guarantee the correct short-circuiting.

In other situations I can think of, A || B will be as fast or faster, including just about all the ones I can think of where you’re comparing a single pair of operands rather than a vector. However, those are almost never critical to performance.

Solution 8 - C++

Adding to the list:

Given the case that A and B are totally unpredictable, but usually A||B is true (i.e. when A is wrong, then usually B is true and vice versa). In this case A||B may lead to a lot of mispredictions, but A|B is predictable and most likely faster.

Solution 9 - C++

> So the question is why don't we always use | instead of || in branches?

In my little mind three reasons, but that might be only me.

First and most: all code tends to be read multiple times more often than it is written. So, in the example you have an array of ints with value either zero or 1. What the code really is meant to do is hidden to the reader. It might be clear to the original writer exactly what is supposed to be done, but years later, and after adding lots and lots of code lines it is probably anyones guess. In my world use what shows the intended comparison, it either is a bit comparison or a logical one. It should not be both.

Secondly: does the performance gain really matter? The example basically does nothing and could be coded a lot more efficient. Not? Well don't create the array in the first place, what the code does now is simply to check the quality of the rand function. The optimization is an example where a better analysis of the problem probably would give a much larger gain.

Third: a different compiler or a different processor will probably change the relative speeds. Now, what you are doing here is applying your knowledge of the internals of the current compiler and current processor. Next version of compiler and processor might turn the question around totally. You really cannot know. And we all know that the only way to know if an optimization actually will be quicker is by testing, which this very specific example has done.

So, if, for some reason, getting the very last bit of efficiency out of the code is worth it I would mark the choice as a "hack". I would probably include a macro with something like "DO_PERFORMANCE_HACKS" and have two versions of the code, one with | and one with || and commenting what the intention is. By changing the macro it will be possible for next readers to see what and where the hack is, and in the future may elect to not compile them.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEduard RostomyanView Question on Stackoverflow
Solution 1 - C++eerorikaView Answer on Stackoverflow
Solution 2 - C++comingstormView Answer on Stackoverflow
Solution 3 - C++Yves DaoustView Answer on Stackoverflow
Solution 4 - C++John BollingerView Answer on Stackoverflow
Solution 5 - C++supercatView Answer on Stackoverflow
Solution 6 - C++Tony TannousView Answer on Stackoverflow
Solution 7 - C++DavislorView Answer on Stackoverflow
Solution 8 - C++tommschView Answer on Stackoverflow
Solution 9 - C++ghellquistView Answer on Stackoverflow