How does GCC optimize out an unused variable incremented inside a loop?

CGccCompiler OptimizationDisassembly

C Problem Overview


I wrote this simple C program:

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). So:

gcc test.c and then time on a.out gives:

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s  

$ gcc -O2 test.c and then time on a.out` gives:

real 0m0.003s  
user 0m0.000s  
sys	0m0.000s  

Then I disassembled both with gcc -S. First one seems quite clear:

    .file "test.c"  
    .text  
.globl main
    .type	main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq	%rbp
    .cfi_def_cfa_offset 16
    movq	%rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl	$0, -8(%rbp)
    movl	$0, -4(%rbp)
    jmp	.L2
.L3:
    addl	$1, -8(%rbp)
    addl	$1, -4(%rbp)
.L2:
    cmpl	$1999999999, -4(%rbp)
    jle	.L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size	main, .-main
    .ident	"GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section	.note.GNU-stack,"",@progbits

L3 adds, L2 compare -4(%rbp) with 1999999999 and loops to L3 if i < 2000000000.

Now the optimized one:

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

I can't understand at all what's going on there! I've got little knowledge of assembly, but I expected something like

addl $2000000000, -8(%rbp)

I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one.

Can someone briefly explain:

  1. The gcc -S -O2 output.
  2. If the loop is optimized as I expected (one sum instead of many sums)?

C Solutions


Solution 1 - C

The compiler is even smarter than that. :)

In fact, it realizes that you aren't using the result of the loop. So it took out the entire loop completely!

This is called Dead Code Elimination.

A better test is to print the result:

#include <stdio.h>
int main(void) {
    int i; int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }

    //  Print result to prevent Dead Code Elimination
    printf("%d\n", count);
}

EDIT : I've added the required #include <stdio.h>; the MSVC assembly listing corresponds to a version without the #include, but it should be the same.


I don't have GCC in front of me at the moment, since I'm booted into Windows. But here's the disassembly of the version with the printf() on MSVC:

EDIT : I had the wrong assembly output. Here's the correct one.

; 57   : int main(){

$LN8:
	sub	rsp, 40					; 00000028H

; 58   : 
; 59   : 
; 60   :     int i; int count = 0;
; 61   :     for(i = 0; i < 2000000000; i++){
; 62   :         count = count + 1;
; 63   :     }
; 64   : 
; 65   :     //  Print result to prevent Dead Code Elimination
; 66   :     printf("%d\n",count);

	lea	rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
	mov	edx, 2000000000				; 77359400H
	call	QWORD PTR __imp_printf

; 67   : 
; 68   : 
; 69   : 
; 70   :
; 71   :     return 0;

	xor	eax, eax

; 72   : }

	add	rsp, 40					; 00000028H
	ret	0

So yes, Visual Studio does this optimization. I'd assume GCC probably does too.

And yes, GCC performs a similar optimization. Here's an assembly listing for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86):

        .file   "test.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $2000000000, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
        .section        .note.GNU-stack,"",@progbits

Solution 2 - C

Compilers have a few tools at their disposal to make code more efficient or more "efficient":

  1. If the result of a computation is never used, the code that performs the computation can be omitted (if the computation acted upon volatile values, those values must still be read but the results of the read may be ignored). If the results of the computations that fed it weren't used, the code that performs those can be omitted as well. If such omission makes the code for both paths on a conditional branch identical, the condition may be regarded as unused and omitted. This will have no effect on the behaviors (other than execution time) of any program that doesn't make out-of-bounds memory accesses or invoke what Annex L would call "Critical Undefined Behaviors".

  2. If the compiler determines that the machine code that computes a value can only produce results in a certain range, it may omit any conditional tests whose outcome could be predicted on that basis. As above, this will not affect behaviors other than execution time unless code invokes "Critical Undefined Behaviors".

  3. If the compiler determines that certain inputs would invoke any form of Undefined Behavior with the code as written, the Standard would allow the compiler to omit any code which would only be relevant when such inputs are received, even if the natural behavior of the execution platform given such inputs would have been benign and the compiler's rewrite would make it dangerous.

Good compilers do #1 and #2. For some reason, however, #3 has become fashionable.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHaileView Question on Stackoverflow
Solution 1 - CMysticialView Answer on Stackoverflow
Solution 2 - CsupercatView Answer on Stackoverflow