Why does this loop produce "warning: iteration 3u invokes undefined behavior" and output more than 4 lines?

C++GccUndefined Behavior

C++ Problem Overview


Compiling this:

#include <iostream>
 
int main()
{
	for (int i = 0; i < 4; ++i)
		std::cout << i*1000000000 << std::endl;
}

and gcc produces the following warning:

warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
   std::cout << i*1000000000 << std::endl;
                  ^

I understand there is a signed integer overflow.

What I cannot get is why i value is broken by that overflow operation?

I've read the answers to https://stackoverflow.com/questions/7682477/why-does-integer-overflow-on-x86-with-gcc-cause-an-infinite-loop, but I'm still not clear on why this happens - I get that "undefined" means "anything can happen", but what's the underlying cause of this specific behavior?

Online: http://ideone.com/dMrRKR

Compiler: gcc (4.8)

C++ Solutions


Solution 1 - C++

Signed integer overflow (as strictly speaking, there is no such thing as "unsigned integer overflow") means undefined behaviour. And this means anything can happen, and discussing why does it happen under the rules of C++ doesn't make sense.

C++11 draft N3337: §5.4:1 > If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [ Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]

Your code compiled with g++ -O3 emits warning (even without -Wall)

a.cpp: In function 'int main()':
a.cpp:11:18: warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
   std::cout << i*1000000000 << std::endl;
                  ^
a.cpp:9:2: note: containing loop
  for (int i = 0; i < 4; ++i)
  ^

The only way we can analyze what the program is doing, is by reading the generated assembly code.

Here is the full assembly listing:

	.file	"a.cpp"
	.section	.text$_ZNKSt5ctypeIcE8do_widenEc,"x"
	.linkonce discard
	.align 2
LCOLDB0:
LHOTB0:
	.align 2
	.p2align 4,,15
	.globl	__ZNKSt5ctypeIcE8do_widenEc
	.def	__ZNKSt5ctypeIcE8do_widenEc;	.scl	2;	.type	32;	.endef
__ZNKSt5ctypeIcE8do_widenEc:
LFB860:
	.cfi_startproc
	movzbl	4(%esp), %eax
	ret	$4
	.cfi_endproc
LFE860:
LCOLDE0:
LHOTE0:
	.section	.text.unlikely,"x"
LCOLDB1:
	.text
LHOTB1:
	.p2align 4,,15
	.def	___tcf_0;	.scl	3;	.type	32;	.endef
___tcf_0:
LFB1091:
	.cfi_startproc
	movl	$__ZStL8__ioinit, %ecx
	jmp	__ZNSt8ios_base4InitD1Ev
	.cfi_endproc
LFE1091:
	.section	.text.unlikely,"x"
LCOLDE1:
	.text
LHOTE1:
	.def	___main;	.scl	2;	.type	32;	.endef
	.section	.text.unlikely,"x"
LCOLDB2:
	.section	.text.startup,"x"
LHOTB2:
	.p2align 4,,15
	.globl	_main
	.def	_main;	.scl	2;	.type	32;	.endef
_main:
LFB1084:
	.cfi_startproc
	leal	4(%esp), %ecx
	.cfi_def_cfa 1, 0
	andl	$-16, %esp
	pushl	-4(%ecx)
	pushl	%ebp
	.cfi_escape 0x10,0x5,0x2,0x75,0
	movl	%esp, %ebp
	pushl	%edi
	pushl	%esi
	pushl	%ebx
	pushl	%ecx
	.cfi_escape 0xf,0x3,0x75,0x70,0x6
	.cfi_escape 0x10,0x7,0x2,0x75,0x7c
	.cfi_escape 0x10,0x6,0x2,0x75,0x78
	.cfi_escape 0x10,0x3,0x2,0x75,0x74
	xorl	%edi, %edi
	subl	$24, %esp
	call	___main
L4:
	movl	%edi, (%esp)
	movl	$__ZSt4cout, %ecx
	call	__ZNSolsEi
	movl	%eax, %esi
	movl	(%eax), %eax
	subl	$4, %esp
	movl	-12(%eax), %eax
	movl	124(%esi,%eax), %ebx
	testl	%ebx, %ebx
	je	L15
	cmpb	$0, 28(%ebx)
	je	L5
	movsbl	39(%ebx), %eax
L6:
	movl	%esi, %ecx
	movl	%eax, (%esp)
	addl	$1000000000, %edi
	call	__ZNSo3putEc
	subl	$4, %esp
	movl	%eax, %ecx
	call	__ZNSo5flushEv
	jmp	L4
	.p2align 4,,10
L5:
	movl	%ebx, %ecx
	call	__ZNKSt5ctypeIcE13_M_widen_initEv
	movl	(%ebx), %eax
	movl	24(%eax), %edx
	movl	$10, %eax
	cmpl	$__ZNKSt5ctypeIcE8do_widenEc, %edx
	je	L6
	movl	$10, (%esp)
	movl	%ebx, %ecx
	call	*%edx
	movsbl	%al, %eax
	pushl	%edx
	jmp	L6
L15:
	call	__ZSt16__throw_bad_castv
	.cfi_endproc
LFE1084:
	.section	.text.unlikely,"x"
LCOLDE2:
	.section	.text.startup,"x"
LHOTE2:
	.section	.text.unlikely,"x"
LCOLDB3:
	.section	.text.startup,"x"
LHOTB3:
	.p2align 4,,15
	.def	__GLOBAL__sub_I_main;	.scl	3;	.type	32;	.endef
__GLOBAL__sub_I_main:
LFB1092:
	.cfi_startproc
	subl	$28, %esp
	.cfi_def_cfa_offset 32
	movl	$__ZStL8__ioinit, %ecx
	call	__ZNSt8ios_base4InitC1Ev
	movl	$___tcf_0, (%esp)
	call	_atexit
	addl	$28, %esp
	.cfi_def_cfa_offset 4
	ret
	.cfi_endproc
LFE1092:
	.section	.text.unlikely,"x"
LCOLDE3:
	.section	.text.startup,"x"
LHOTE3:
	.section	.ctors,"w"
	.align 4
	.long	__GLOBAL__sub_I_main
.lcomm __ZStL8__ioinit,1,1
	.ident	"GCC: (i686-posix-dwarf-rev1, Built by MinGW-W64 project) 4.9.0"
	.def	__ZNSt8ios_base4InitD1Ev;	.scl	2;	.type	32;	.endef
	.def	__ZNSolsEi;	.scl	2;	.type	32;	.endef
	.def	__ZNSo3putEc;	.scl	2;	.type	32;	.endef
	.def	__ZNSo5flushEv;	.scl	2;	.type	32;	.endef
	.def	__ZNKSt5ctypeIcE13_M_widen_initEv;	.scl	2;	.type	32;	.endef
	.def	__ZSt16__throw_bad_castv;	.scl	2;	.type	32;	.endef
	.def	__ZNSt8ios_base4InitC1Ev;	.scl	2;	.type	32;	.endef
	.def	_atexit;	.scl	2;	.type	32;	.endef

I can barely even read assembly, but even I can see the addl $1000000000, %edi line. The resulting code looks more like

for(int i = 0; /* nothing, that is - infinite loop */; i += 1000000000)
    std::cout << i << std::endl;

This comment of @T.C.:

> I suspect that it's something like: (1) because every iteration with i of any value larger than 2 has undefined behavior -> (2) we can assume that i <= 2 for optimization purposes -> (3) the loop condition is always true -> (4) it's optimized away into an infinite loop.

gave me idea to compare the assembly code of the OP's code to the assembly code of the following code, with no undefined behaviour.

#include <iostream>

int main()
{
    // changed the termination condition
    for (int i = 0; i < 3; ++i)
        std::cout << i*1000000000 << std::endl;
}

And, in fact, the correct code has termination condition.

    ; ...snip...
L6:
    mov	ecx, edi
    mov	DWORD PTR [esp], eax
    add	esi, 1000000000
    call	__ZNSo3putEc
    sub	esp, 4
    mov	ecx, eax
    call	__ZNSo5flushEv
    cmp	esi, -1294967296 // here it is
    jne	L7
    lea	esp, [ebp-16]
    xor	eax, eax
    pop	ecx
    ; ...snip...

Unfortunately this is the consequences of writing buggy code.

Fortunately you can make use of better diagnostics and better debugging tools - that's what they are for:

  • enable all warnings

  • -Wall is the gcc option that enables all useful warnings with no false positives. This is a bare minimum that you should always use.

  • gcc has many other warning options, however, they are not enabled with -Wall as they may warn on false positives

  • Visual C++ unfortunately is lagging behind with the ability to give useful warnings. At least the IDE enables some by default.

  • use debug flags for debugging

    • for integer overflow -ftrapv traps the program on overflow,
    • Clang compiler is excellent for this: -fcatch-undefined-behavior catches a lot of instances of undefined behaviour (note: "a lot of" != "all of them")

> I have a spaghetti mess of a program not written by me that needs to be shipped tomorrow! HELP!!!!!!111oneone

Use gcc's -fwrapv

> This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation.

1 - this rule does not apply to "unsigned integer overflow", as §3.9.1.4 says that

> Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.

and e.g. result of UINT_MAX + 1 is mathematically defined - by the rules of arithmetic modulo 2n

Solution 2 - C++

Short answer, gcc specifically has documented this problem, we can see that in the gcc 4.8 release notes which says (emphasis mine going forward):

> GCC now uses a more aggressive analysis to derive an upper bound for > the number of iterations of loops using constraints imposed by > language standards. This may cause non-conforming programs to no > longer work as expected, such as SPEC CPU 2006 464.h264ref and > 416.gamess. A new option, -fno-aggressive-loop-optimizations, was added to disable this aggressive analysis. In some loops that have > known constant number of iterations, but undefined behavior is known > to occur in the loop before reaching or during the last iteration, GCC > will warn about the undefined behavior in the loop instead of deriving > lower upper bound of the number of iterations for the loop. The > warning can be disabled with -Wno-aggressive-loop-optimizations.

and indeed if we use -fno-aggressive-loop-optimizations the infinite loop behavior should cease and it does in all the cases I have tested.

The long answer starts with knowing that signed integer overflow is undefined behavior by looking at the draft C++ standard section 5 Expressions paragraph 4 which says:

> If during the evaluation of an expression, the result is not > mathematically defined or not in the range of representable values for > its type, the behavior is undefined. [ Note: most existing > implementations of C++ ignore integer overflows. Treatment of division > by zero, forming a remainder using a zero divisor, and all floating > point exceptions vary among machines, and is usually adjustable by a > library function. —end note

We know that the standard says undefined behavior is unpredictable from the note that come with the definition which says:

> [ Note: Undefined behavior may be expected when this International > Standard omits any explicit definition of behavior or when a program > uses an erroneous construct or erroneous data. Permissible undefined > behavior ranges from ignoring the situation completely with > unpredictable results, to behaving during translation or program > execution in a documented manner characteristic of the environment > (with or without the issuance of a diagnostic message), to terminating > a translation or execution (with the issuance of a diagnostic > message). Many erroneous program constructs do not engender undefined > behavior; they are required to be diagnosed. —end note ]

But what in the world can the gcc optimizer be doing to turn this into an infinite loop? It sounds completely wacky. But thankfully gcc gives us a clue to figuring it out in the warning:

warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
   std::cout << i*1000000000 << std::endl;
                  ^

The clue is the Waggressive-loop-optimizations, what does that mean? Fortunately for us this is not the first time this optimization has broken code in this way and we are lucky because John Regehr has documented a case in the article GCC pre-4.8 Breaks Broken SPEC 2006 Benchmarks which shows the following code:

int d[16];
 
int SATD (void)
{
  int satd = 0, dd, k;
  for (dd=d[k=0]; k<16; dd=d[++k]) {
    satd += (dd < 0 ? -dd : dd);
  }
  return satd;
}

the article says:

> The undefined behavior is accessing d[16] just before exiting the > loop. In C99 it is legal to create a pointer to an element one > position past the end of the array, but that pointer must not be > dereferenced.

and later on says:

> In detail, here is what’s going on. A C compiler, upon seeing d[++k], > is permitted to assume that the incremented value of k is within the > array bounds, since otherwise undefined behavior occurs. For the code > here, GCC can infer that k is in the range 0..15. A bit later, when > GCC sees k<16, it says to itself: “Aha– that expression is always > true, so we have an infinite loop.” The situation here, where the > compiler uses the assumption of well-definedness to infer a useful > dataflow fact,

So what the compiler must be doing in some cases is assuming since signed integer overflow is undefined behavior then i must always be less than 4 and thus we have an infinite loop.

He explains this is very similar to the infamous Linux kernel null pointer check removal where in seeing this code:

struct foo *s = ...;
int x = s->f;
if (!s) return ERROR;

gcc inferred that since s was deferenced in s->f; and since dereferencing a null pointer is undefined behavior then s must not be null and therefore optimizes away the if (!s) check on the next line.

The lesson here is that modern optimizers are very aggressive about exploiting undefined behavior and most likely will only get more aggressive. Clearly with just a few examples we can see the optimizer does things that seem completely unreasonable to a programmer but in retrospect from the optimizers perspective make sense.

Solution 3 - C++

tl;dr The code generates a test that integer + positive integer == negative integer. Usually the optimizer does not optimize this out, but in the specific case of std::endl being used next, the compiler does optimize this test out. I haven't figured out what's special about endl yet.


From the assembly code at -O1 and higher levels, it is clear that gcc refactors the loop to:

i = 0;
do {
    cout << i << endl;
    i += NUMBER;
} 
while (i != NUMBER * 4)

The biggest value that works correctly is 715827882, i.e. floor(INT_MAX/3). The assembly snippet at -O1 is:

L4:
movsbl	%al, %eax
movl	%eax, 4(%esp)
movl	$__ZSt4cout, (%esp)
call	__ZNSo3putEc
movl	%eax, (%esp)
call	__ZNSo5flushEv
addl	$715827882, %esi
cmpl	$-1431655768, %esi
jne	L6
    // fallthrough to "return" code

Note, the -1431655768 is 4 * 715827882 in 2's complement.

Hitting -O2 optimizes that to the following:

L4:
movsbl	%al, %eax
addl	$715827882, %esi
movl	%eax, 4(%esp)
movl	$__ZSt4cout, (%esp)
call	__ZNSo3putEc
movl	%eax, (%esp)
call	__ZNSo5flushEv
cmpl	$-1431655768, %esi
jne	L6
leal	-8(%ebp), %esp
jne	L6 
   // fallthrough to "return" code

So the optimization that has been made is merely that the addl was moved higher up.

If we recompile with 715827883 instead then the -O1 version is identical apart from the changed number and test value. However, -O2 then makes a change:

L4:
movsbl	%al, %eax
addl	$715827883, %esi
movl	%eax, 4(%esp)
movl	$__ZSt4cout, (%esp)
call	__ZNSo3putEc
movl	%eax, (%esp)
call	__ZNSo5flushEv
jmp	L2

Where there was cmpl $-1431655764, %esi at -O1, that line has been removed for -O2. The optimizer must have decided that adding 715827883 to %esi can never equal -1431655764.

This is pretty puzzling. Adding that to INT_MIN+1 does generate the expected result, so the optimizer must have decided that %esi can never be INT_MIN+1 and I'm not sure why it would decide that.

In the working example it seems it'd be equally valid to conclude that adding 715827882 to a number cannot equal INT_MIN + 715827882 - 2 ! (this is only possible if wraparound does actually occur), yet it does not optimize the line out in that example.


The code I was using is:

#include <iostream>
#include <cstdio>
 
int main()
{
    for (int i = 0; i < 4; ++i)
    {
        //volatile int j = i*715827883;
        volatile int j = i*715827882;
        printf("%d\n", j);

        std::endl(std::cout);
    }
}

If the std::endl(std::cout) is removed then the optimization no longer occurs. In fact replacing it with std::cout.put('\n'); std::flush(std::cout); also causes the optimization to not happen, even though std::endl is inlined.

The inlining of std::endl seems to affect the earlier part of the loop structure (which I don't quite understand what it is doing but I'll post it here in case someone else does):

With original code and -O2:

L2:
movl	%esi, 28(%esp)
movl	28(%esp), %eax
movl	$LC0, (%esp)
movl	%eax, 4(%esp)
call	_printf
movl	__ZSt4cout, %eax
movl	-12(%eax), %eax
movl	__ZSt4cout+124(%eax), %ebx
testl	%ebx, %ebx
je	L10
cmpb	$0, 28(%ebx)
je	L3
movzbl	39(%ebx), %eax
L4:
movsbl	%al, %eax
addl	$715827883, %esi
movl	%eax, 4(%esp)
movl	$__ZSt4cout, (%esp)
call	__ZNSo3putEc
movl	%eax, (%esp)
call	__ZNSo5flushEv
jmp	L2                  // no test

With mymanual inlining of std::endl, -O2:

L3:
movl	%ebx, 28(%esp)
movl	28(%esp), %eax
addl	$715827883, %ebx
movl	$LC0, (%esp)
movl	%eax, 4(%esp)
call	_printf
movl	$10, 4(%esp)
movl	$__ZSt4cout, (%esp)
call	__ZNSo3putEc
movl	$__ZSt4cout, (%esp)
call	__ZNSo5flushEv
cmpl	$-1431655764, %ebx
jne	L3
xorl	%eax, %eax

One difference between these two is that %esi is used in the original , and %ebx in the second version; is there any difference in semantics defined between %esi and %ebx in general? (I don't know much about x86 assembly).

Solution 4 - C++

Another example of this error being reported in gcc is when you have a loop that executes for a constant number of iterations, but you are using the counter variable as an index into an array that has less than that number of items, such as:

int a[50], x;

for( i=0; i < 1000; i++) x = a[i];

The compiler can determine that this loop will try to access memory outside of the array 'a'. The compiler complains about this with this rather cryptic message:

> iteration xxu invokes undefined behavior [-Werror=aggressive-loop-optimizations]

Solution 5 - C++

>What I cannot get is why i value is broken by that overflow operation?

It seems that integer overflow occurs in 4th iteration (for i = 3). signed integer overflow invokes undefined behavior. In this case nothing can be predicted. The loop may iterate only 4 times or it may go to infinite or anything else!
Result may vary compiler to compiler or even for different versions of same compiler.

C11: 1.3.24 undefined behavior:

>behavior for which this International Standard imposes no requirements
[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. —end note ]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionzerkmsView Question on Stackoverflow
Solution 1 - C++milleniumbugView Answer on Stackoverflow
Solution 2 - C++Shafik YaghmourView Answer on Stackoverflow
Solution 3 - C++M.MView Answer on Stackoverflow
Solution 4 - C++Ed TylerView Answer on Stackoverflow
Solution 5 - C++haccksView Answer on Stackoverflow