Signed overflow in C++ and undefined behaviour (UB)

C++Undefined Behavior

C++ Problem Overview


I'm wondering about the use of code like the following

int result = 0;
int factor = 1;
for (...) {
    result = ...
    factor *= 10;
}
return result;

If the loop is iterated over n times, then factor is multiplied by 10 exactly n times. However, factor is only ever used after having been multiplied by 10 a total of n-1 times. If we assume that factor never overflows except on the last iteration of the loop, but may overflow on the last iteration of the loop, then should such code be acceptable? In this case, the value of factor would provably never be used after the overflow has happened.

I'm having a debate on whether code like this should be accepted. It would be possible to put the multiplication inside an if-statement and just not do the multiplication on the last iteration of the loop when it can overflow. The downside is that it clutters the code and adds an unnecessary branch that would need to check for on all the previous loop iterations. I could also iterate over the loop one fewer time and replicate the loop body once after the loop, again, this complicates the code.

The actual code in question is used in a tight inner-loop that consumes a large chunk of the total CPU time in a real-time graphics application.

C++ Solutions


Solution 1 - C++

Compilers do assume that a valid C++ program does not contain UB. Consider for example:

if (x == nullptr) {
    *x = 3;
} else {
    *x = 5;
}

If x == nullptr then dereferencing it and assigning a value is UB. Hence the only way this could end in a valid program is when x == nullptr will never yield true and the compiler can assume under the as if rule, the above is equivalent to:

*x = 5;

Now in your code

int result = 0;
int factor = 1;
for (...) {      // Loop until factor overflows but not more
   result = ...
   factor *= 10;
}
return result;

The last multiplication of factor cannot happen in a valid program (signed overflow is undefined). Hence also the assignment to result cannot happen. As there is no way to branch before the last iteration also the previous iteration cannot happen. Eventually, the part of code that is correct (i.e., no undefined behaviour ever happens) is:

// nothing :(

Solution 2 - C++

The behaviour of int overflow is undefined.

It doesn't matter if you read factor outside the loop body; if it has overflowed by then then the behaviour of your code on, after, and somewhat paradoxically before the overflow is undefined.

One issue that might arise in keeping this code is that compilers are getting more and more aggressive when it comes to optimisation. In particular they are developing a habit where they assume that undefined behaviour never happens. For this to be the case, they may remove the for loop altogether.

Can't you use an unsigned type for factor although then you'd need to worry about unwanted conversion of int to unsigned in expressions containing both?

Solution 3 - C++

It might be insightful to consider real-world optimizers. Loop unrolling is a known technique. The basic idea of loop unrolling is that

for (int i = 0; i != 3; ++i)
    foo()

might be better implemented behind the scenes as

 foo()
 foo()
 foo()

This is the easy case, with a fixed bound. But modern compilers can also do this for variable bounds:

for (int i = 0; i != N; ++i)
   foo();

becomes

__RELATIVE_JUMP(3-N)
foo();
foo();
foo();

Obviously this only works if the compiler knows that N<=3. And that's where we get back to the original question:

int result = 0;
int factor = 1;
for (...) {
    result = ...
    factor *= 10;
}
return result;

Because the compiler knows that signed overflow does not occur, it knows that the loop can execute a maximum of 9 times on 32 bits architectures. 10^10 > 2^32. It can therefore do a 9 iteration loop unroll. But the intended maximum was 10 iterations !.

What might happen is that you get a relative jump to a assembly instruction (9-N) with N==10, so an offset of -1, which is the jump instruction itself. Oops. This is a perfectly valid loop optimization for well-defined C++, but the example given turns into a tight infinite loop.

Solution 4 - C++

Any signed integer overflow results in undefined behaviour, regardless of whether or not the overflowed value is or might be read.

Maybe in your use-case you can to lift the first iteration out of the loop, turning this

int result = 0;
int factor = 1;
for (int n = 0; n < 10; ++n) {
    result += n + factor;
    factor *= 10;
}
// factor "is" 10^10 > INT_MAX, UB

into this

int factor = 1;
int result = 0 + factor; // first iteration
for (int n = 1; n < 10; ++n) {
    factor *= 10;
    result += n + factor;
}
// factor is 10^9 < INT_MAX

With optimization enabled, the compiler might unroll the second loop above into one conditional jump.

Solution 5 - C++

This is UB; in ISO C++ terms the entire behaviour of the entire program is completely unspecified for an execution that eventually hits UB. The classic example is as far as the C++ standard cares, it can make demons fly out of your nose. (I recommend against using an implementation where nasal demons are a real possibility). See other answers for more details.

Compilers can "cause trouble" at compile time for paths of execution they can see leading to compile-time-visible UB, e.g. assume those basic blocks are never reached.

See also What Every C Programmer Should Know About Undefined Behavior (LLVM blog). As explained there, signed-overflow UB lets compilers prove that for(... i <= n ...) loops are not infinite loops, even for unknown n. It also lets them "promote" int loop counters to pointer width instead of redoing sign-extension. (So the consequence of UB in that case could be accessing outside the low 64k or 4G elements of an array, if you were expecting signed wrapping of i into its value range.)

In some cases compilers will emit an illegal instruction like x86 ud2 for a block that provably causes UB if ever executed. (Note that a function might not ever be called, so compilers can't in general go berserk and break other functions, or even possible paths through a function that don't hit UB. i.e. the machine code it compiles to must still work for all inputs that don't lead to UB.)


Probably the most efficient solution is to manually peel the last iteration so the unneeded factor*=10 can be avoided.

int result = 0;
int factor = 1;
for (... i < n-1) {   // stop 1 iteration early
    result = ...
    factor *= 10;
}
 result = ...      // another copy of the loop body, using the last factor
 //   factor *= 10;    // and optimize away this dead operation.
return result;

Or if the loop body is large, consider simply using an unsigned type for factor. Then you can let the unsigned multiply overflow and it will just do well-defined wrapping to some power of 2 (the number of value bits in the unsigned type).

This is fine even if you use it with signed types, especially if your unsigned->signed conversion never overflows.

Conversion between unsigned and 2's complement signed is free (same bit-pattern for all values); the modulo wrapping for int -> unsigned specified by the C++ standard simplifies to just using the same bit-pattern, unlike for one's complement or sign/magnitude.

And unsigned->signed is similarly trivial, although it is implementation-defined for values larger than INT_MAX. If you aren't using the huge unsigned result from the last iteration, you have nothing to worry about. But if you are, see https://stackoverflow.com/questions/9498190/is-conversion-from-unsigned-to-signed-undefined. The value-doesn't-fit case is implementation-defined, which means that an implementation must pick some behaviour; sane ones just truncate (if necessary) the unsigned bit pattern and use it as signed, because that works for in-range values the same way with no extra work. And it's definitely not UB. So big unsigned values can become negative signed integers. e.g. after int x = u; gcc and clang don't optimize away x>=0 as always being true, even without -fwrapv, because they defined the behaviour.

Solution 6 - C++

If you can tolerate a few additional assembly instructions in the loop, instead of

int factor = 1;
for (int j = 0; j < n; ++j) {
    ...
    factor *= 10;
}

you can write:

int factor = 0;
for (...) {
    factor = 10 * factor + !factor;
    ...
}

to avoid the last multiplication. !factor will not introduce a branch:

    xor     ebx, ebx
L1:                       
    xor     eax, eax              
    test    ebx, ebx              
    lea     edx, [rbx+rbx*4]      
    sete    al    
    add     ebp, 1                
    lea     ebx, [rax+rdx*2]      
    mov     edi, ebx              
    call    consume(int)          
    cmp     r12d, ebp             
    jne     .L1                   

This code

int factor = 0;
for (...) {
    factor = factor ? 10 * factor : 1;
    ...
}

also results in branchless assembly after optimization:

    mov     ebx, 1
    jmp     .L1                   
.L2:                               
    lea     ebx, [rbx+rbx*4]       
    add     ebx, ebx
.L1:
    mov     edi, ebx
    add     ebp, 1
    call    consume(int)
    cmp     r12d, ebp
    jne     .L2

(Compiled with GCC 8.3.0 -O3)

Solution 7 - C++

You didn't show what's in the parentheses of the for statement, but I'm going to assume it's something like this:

for (int n = 0; n < 10; ++n) {
    result = ...
    factor *= 10;
}

You can simply move the counter increment and loop termination check into the body:

for (int n = 0; ; ) {
    result = ...
    if (++n >= 10) break;
    factor *= 10;
}

The number of assembly instructions in the loop will remain the same.

Inspired by Andrei Alexandrescu's presentation "Speed Is Found In The Minds of People".

Solution 8 - C++

Consider the function:

unsigned mul_mod_65536(unsigned short a, unsigned short b)
{
  return (a*b) & 0xFFFFu;
}

According to the published Rationale, the authors of the Standard would have expected that if this function were invoked on (e.g.) a commonplace 32-bit computer with arguments of 0xC000 and 0xC000, promoting the operands of * to signed int would cause the computation to yield -0x10000000, which when converted to unsigned would yield 0x90000000u--the same answer as if they had made unsigned short promote to unsigned. Nonetheless, gcc will sometimes optimize that function in ways that would behave nonsensically if an overflow occurs. Any code where some combination of inputs could cause an overflow must be processed with -fwrapv option unless it would be acceptable to allow creators of deliberately-malformed input to execute arbitrary code of their choosing.

Solution 9 - C++

Why not this:

int result = 0;
int factor = 10;
for (...) {
    factor *= 10;
    result = ...
}
return result;

Solution 10 - C++

There are many different faces of Undefined Behavior, and what's acceptable depends on the usage.

> tight inner-loop that consumes a large chunk of the total CPU time in a real-time graphics application

That, by itself, is a bit of an unusual thing, but be that as it may... if this is indeed the case, then the UB is most probably within the realm "allowable, acceptable". Graphics programming is notorious for hacks and ugly stuff. As long as it "works" and it doesn't take longer than 16.6ms to produce a frame, usually, nobody cares. But still, be aware of what it means to invoke UB.

First, there is the standard. From that point of view, there's nothing to discuss and no way to justify, your code is simply invalid. There are no ifs or whens, it just isn't a valid code. You might as well say that's middle-finger-up from your point of view, and 95-99% of the time you'll be good to go anyway.

Next, there's the hardware side. There are some uncommon, weird architectures where this is a problem. I'm saying "uncommon, weird" because on the one architecture that makes up 80% of all computers (or the two architectures that together make up 95% of all computers) overflow is a "yeah, whatever, don't care" thing on the hardware level. You sure do get a garbage (although still predictable) result, but no evil things happen.
That is not the case on every architecture, you might very well get a trap on overflow (though seeing how you speak of a graphics application, the chances of being on such an odd architecture are rather small). Is portability an issue? If it is, you may want to abstain.

Last, there is the compiler/optimizer side. One reason why overflow is undefined is that simply leaving it at that was easiest to cope with hardware once upon a time. But another reason is that e.g. x+1 is guaranteed to always be larger than x, and the compiler/optimizer can exploit this knowledge. Now, for the previously mentioned case, compilers are indeed known to act this way and simply strip out complete blocks (there existed a Linux exploit some years ago which was based on the compiler having dead-stripped some validation code because of exactly this).
For your case, I would seriously doubt that the compiler does some special, odd, optimizations. However, what do you know, what do I know. When in doubt, try it out. If it works, you are good to go.

(And finally, there's of course code audit, you might have to waste your time discussing this with an auditor if you're unlucky.)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondelView Question on Stackoverflow
Solution 1 - C++463035818_is_not_a_numberView Answer on Stackoverflow
Solution 2 - C++BathshebaView Answer on Stackoverflow
Solution 3 - C++MSaltersView Answer on Stackoverflow
Solution 4 - C++elbrunovskyView Answer on Stackoverflow
Solution 5 - C++Peter CordesView Answer on Stackoverflow
Solution 6 - C++EvgView Answer on Stackoverflow
Solution 7 - C++OktalistView Answer on Stackoverflow
Solution 8 - C++supercatView Answer on Stackoverflow
Solution 9 - C++SieunguoimayView Answer on Stackoverflow
Solution 10 - C++DamonView Answer on Stackoverflow