Which is faster : if (bool) or if(int)?

C++AssemblyIntBoolean

C++ Problem Overview


>https://stackoverflow.com/questions/5554725/which-value-is-better-to-use-boolean-true-or-integer-1

The above topic made me do some experiments with bool and int in if condition. So just out of curiosity I wrote this program:

int f(int i) 
{
	if ( i ) return 99;   //if(int)
	else  return -99;
}
int g(bool b)
{
	if ( b ) return 99;   //if(bool)
	else  return -99;
}
int main(){}

g++ intbool.cpp -S generates asm code for each functions as follows:

  • asm code for f(int)

     __Z1fi:
        LFB0:
              pushl	%ebp
        LCFI0:
               movl	%esp, %ebp
        LCFI1:
               cmpl	$0, 8(%ebp)
               je	L2
               movl	$99, %eax
               jmp	L3
        L2:
               movl	$-99, %eax
        L3:
               leave
        LCFI2:
               ret
    
  • asm code for g(bool)

     __Z1gb:
        LFB1:
               pushl	%ebp
        LCFI3:
               movl	%esp, %ebp
        LCFI4:
               subl	$4, %esp
        LCFI5:
               movl	8(%ebp), %eax
               movb	%al, -4(%ebp)
               cmpb	$0, -4(%ebp)
               je	L5
               movl	$99, %eax
               jmp	L6
        L5:
               movl	$-99, %eax
        L6:
               leave
        LCFI6:
               ret
    

Surprisingly, g(bool) generates more asm instructions! Does it mean that if(bool) is little slower than if(int)? I used to think bool is especially designed to be used in conditional statement such as if, so I was expecting g(bool) to generate less asm instructions, thereby making g(bool) more efficient and fast.

EDIT:

I'm not using any optimization flag as of now. But even absence of it, why does it generate more asm for g(bool) is a question for which I'm looking for a reasonable answer. I should also tell you that -O2 optimization flag generates exactly same asm. But that isn't the question. The question is what I've asked.


C++ Solutions


Solution 1 - C++

Makes sense to me. Your compiler apparently defines a bool as an 8-bit value, and your system ABI requires it to "promote" small (< 32-bit) integer arguments to 32-bit when pushing them onto the call stack. So to compare a bool, the compiler generates code to isolate the least significant byte of the 32-bit argument that g receives, and compares it with cmpb. In the first example, the int argument uses the full 32 bits that were pushed onto the stack, so it simply compares against the whole thing with cmpl.

Solution 2 - C++

Compiling with -03 gives the following for me:

f:

	pushl	%ebp
	movl	%esp, %ebp
	cmpl	$1, 8(%ebp)
	popl	%ebp
	sbbl	%eax, %eax
	andb	$58, %al
	addl	$99, %eax
	ret

g:

	pushl	%ebp
	movl	%esp, %ebp
	cmpb	$1, 8(%ebp)
	popl	%ebp
	sbbl	%eax, %eax
	andb	$58, %al
	addl	$99, %eax
	ret

.. so it compiles to essentially the same code, except for cmpl vs cmpb. This means that the difference, if there is any, doesn't matter. Judging by unoptimized code is not fair.


Edit to clarify my point. Unoptimized code is for simple debugging, not for speed. Comparing the speed of unoptimized code is senseless.

Solution 3 - C++

When I compile this with a sane set of options (specifically -O3), here's what I get:

For f():

        .type   _Z1fi, @function
_Z1fi:
.LFB0:
        .cfi_startproc
        .cfi_personality 0x3,__gxx_personality_v0
        cmpl    $1, %edi
        sbbl    %eax, %eax
        andb    $58, %al
        addl    $99, %eax
        ret
        .cfi_endproc

For g():

        .type   _Z1gb, @function
_Z1gb:
.LFB1:
        .cfi_startproc
        .cfi_personality 0x3,__gxx_personality_v0
        cmpb    $1, %dil
        sbbl    %eax, %eax
        andb    $58, %al
        addl    $99, %eax
        ret
        .cfi_endproc

They still use different instructions for the comparison (cmpb for boolean vs. cmpl for int), but otherwise the bodies are identical. A quick look at the Intel manuals tells me: ... not much of anything. There's no such thing as cmpb or cmpl in the Intel manuals. They're all cmp and I can't find the timing tables at the moment. I'm guessing, however, that there's no clock difference between comparing a byte immediate vs. comparing a long immediate, so for all practical purposes the code is identical.


edited to add the following based on your addition

The reason the code is different in the unoptimized case is that it is unoptimized. (Yes, it's circular, I know.) When the compiler walks the AST and generates code directly, it doesn't "know" anything except what's at the immediate point of the AST it's in. At that point it lacks all contextual information needed to know that at this specific point it can treat the declared type bool as an int. A boolean is obviously by default treated as a byte and when manipulating bytes in the Intel world you have to do things like sign-extend to bring it to certain widths to put it on the stack, etc. (You can't push a byte.)

When the optimizer views the AST and does its magic, however, it looks at surrounding context and "knows" when it can replace code with something more efficient without changing semantics. So it "knows" it can use an integer in the parameter and thereby lose the unnecessary conversions and widening.

Solution 4 - C++

With GCC 4.5 on Linux and Windows at least, sizeof(bool) == 1. On x86 and x86_64, you can't pass in less than an general purpose register's worth to a function (whether via the stack or a register depending on the calling convention etc...).

So the code for bool, when un-optimized, actually goes to some length to extract that bool value from the argument stack (using another stack slot to save that byte). It's more complicated than just pulling a native register-sized variable.

Solution 5 - C++

At the machine level there is no such thing as bool

Very few instruction set architectures define any sort of boolean operand type, although there are often instructions that trigger an action on non-zero values. To the CPU, usually, everything is one of the scalar types or a string of them.

A given compiler and a given ABI will need to choose specific sizes for int and bool and when, like in your case, these are different sizes they may generate slightly different code, and at some levels of optimization one may be slightly faster.

Why is bool one byte on many systems?

It's safer to choose a char type for bool because someone might make a really large array of them.

Update: by "safer", I mean: for the compiler and library implementors. I'm not saying people need to reimplement the system type.

Solution 6 - C++

Yeah, the discussion's fun. But just test it:

Test code:

#include <stdio.h>
#include <string.h>

int testi(int);
int testb(bool);
int main (int argc, char* argv[]){
  bool valb;
  int  vali;
  int loops;
  if( argc < 2 ){
    return 2;
  }
  valb = (0 != (strcmp(argv[1], "0")));
  vali = strcmp(argv[1], "0");
  printf("Arg1: %s\n", argv[1]);
  printf("BArg1: %i\n", valb ? 1 : 0);
  printf("IArg1: %i\n", vali);
  for(loops=30000000; loops>0; loops--){
    //printf("%i: %i\n", loops, testb(valb=!valb));
    printf("%i: %i\n", loops, testi(vali=!vali));
  }
  return valb;
}

int testi(int val){
  if( val ){
    return 1;
  }
  return 0;
}
int testb(bool val){
  if( val ){
    return 1;
  }
  return 0;
}

Compiled on a 64-bit Ubuntu 10.10 laptop with: g++ -O3 -o /tmp/test_i /tmp/test_i.cpp

Integer-based comparison:

sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real	0m8.203s
user	0m8.170s
sys	0m0.010s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real	0m8.056s
user	0m8.020s
sys	0m0.000s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real	0m8.116s
user	0m8.100s
sys	0m0.000s

Boolean test / print uncommented (and integer commented):

sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real	0m8.254s
user	0m8.240s
sys	0m0.000s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real	0m8.028s
user	0m8.000s
sys	0m0.010s
sauer@trogdor:/tmp$ time /tmp/test_i 1 > /dev/null

real	0m7.981s
user	0m7.900s
sys	0m0.050s

They're the same with 1 assignment and 2 comparisons each loop over 30 million loops. Find something else to optimize. For example, don't use strcmp unnecessarily. ;)

Solution 7 - C++

It will mostly depend on the compiler and the optimization. There's an interesting discussion (language agnostic) here:

https://stackoverflow.com/questions/1070063/does-if-bool-true-require-one-more-step-than-if-bool

Also, take a look at this post: http://www.linuxquestions.org/questions/programming-9/c-compiler-handling-of-boolean-variables-290996/

Solution 8 - C++

Approaching your question in two different ways:

If you are specifically talking about C++ or any programming language that will produce assembly code for that matter, we are bound to what code the compiler will generate in ASM. We are also bound to the representation of true and false in c++. An integer will have to be stored in 32 bits, and I could simply use a byte to store the boolean expression. Asm snippets for conditional statements:

For the integer:

  mov eax,dword ptr[esp]    ;Store integer
  cmp eax,0                 ;Compare to 0
  je  false                 ;If int is 0, its false
  ;Do what has to be done when true
false:
  ;Do what has to be done when false

For the bool:

  mov  al,1     ;Anything that is not 0 is true
  test al,1     ;See if first bit is fliped
  jz   false    ;Not fliped, so it's false
  ;Do what has to be done when true
false:
  ;Do what has to be done when false

So, that's why the speed comparison is so compile dependent. In the case above, the bool would be slightly fast since cmp would imply a subtraction for setting the flags. It also contradicts with what your compiler generated.

Another approach, a much simpler one, is to look at the logic of the expression on it's own and try not to worry about how the compiler will translate your code, and I think this is a much healthier way of thinking. I still believe, ultimately, that the code being generated by the compiler is actually trying to give a truthful resolution. What I mean is that, maybe if you increase the test cases in the if statement and stick with boolean in one side and integer in another, the compiler will make it so the code generated will execute faster with boolean expressions in the machine level.

I'm considering this is a conceptual question, so I'll give a conceptual answer. This discussion reminds me of discussions I commonly have about whether or not code efficiency translates to less lines of code in assembly. It seems that this concept is generally accepted as being true. Considering that keeping track of how fast the ALU will handle each statement is not viable, the second option would be to focus on jumps and compares in assembly. When that is the case, the distinction between boolean statements or integers in the code you presented becomes rather representative. The result of an expression in C++ will return a value that will then be given a representation. In assembly, on the other hand, the jumps and comparisons will be based in numeric values regardless of what type of expression was being evaluated back at you C++ if statement. It is important on these questions to remember that purely logicical statements such as these end up with a huge computational overhead, even though a single bit would be capable of the same thing.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNawazView Question on Stackoverflow
Solution 1 - C++Sherm PendleyView Answer on Stackoverflow
Solution 2 - C++Alexander GesslerView Answer on Stackoverflow
Solution 3 - C++JUST MY correct OPINIONView Answer on Stackoverflow
Solution 4 - C++MatView Answer on Stackoverflow
Solution 5 - C++DigitalRossView Answer on Stackoverflow
Solution 6 - C++dannysauerView Answer on Stackoverflow
Solution 7 - C++AleadamView Answer on Stackoverflow
Solution 8 - C++ArtieView Answer on Stackoverflow