How did I get a value larger than 8 bits in size from an 8-bit integer?

C++GccUndefined Behavior

C++ Problem Overview


I tracked down an extremely nasty bug hiding behind this little gem. I am aware that per the C++ spec, signed overflows are undefined behavior, but only when the overflow occurs when the value is extended to bit-width sizeof(int). As I understand it, incrementing a char shouldn't ever be undefined behavior as long as sizeof(char) < sizeof(int). But that doesn't explain how c is getting an impossible value. As an 8-bit integer, how can c hold values greater than its bit-width?

##Code

// Compiled with gcc-4.7.2
#include <cstdio>
#include <stdint.h>
#include <climits>
 
int main()
{
   int8_t c = 0;
   printf("SCHAR_MIN: %i\n", SCHAR_MIN);
   printf("SCHAR_MAX: %i\n", SCHAR_MAX);

   for (int32_t i = 0; i <= 300; i++)
      printf("c: %i\n", c--);
   
   printf("c: %i\n", c);
  
   return 0;
}

##Output

SCHAR_MIN: -128
SCHAR_MAX: 127
c: 0
c: -1
c: -2
c: -3
...
c: -127
c: -128  // <= The next value should still be an 8-bit value.
c: -129  // <= What? That's more than 8 bits!
c: -130  // <= Uh...
c: -131
...
c: -297
c: -298  // <= Getting ridiculous now.
c: -299
c: -300
c: -45   // <= ..........

##Check it out on ideone.

C++ Solutions


Solution 1 - C++

This is a compiler bug.

Although getting impossible results for undefined behaviour is a valid consequence, there is actually no undefined behaviour in your code. What's happening is that the compiler thinks the behaviour is undefined, and optimises accordingly.

If c is defined as int8_t, and int8_t promotes to int, then c-- is supposed to perform the subtraction c - 1 in int arithmetic and convert the result back to int8_t. The subtraction in int does not overflow, and converting out-of-range integral values to another integral type is valid. If the destination type is signed, the result is implementation-defined, but it must be a valid value for the destination type. (And if the destination type is unsigned, the result is well-defined, but that does not apply here.)

Solution 2 - C++

A compiler can have bugs which are other than nonconformances to the standard, because there are other requirements. A compiler should be compatible with other versions of itself. It may also be expected to be compatible in some ways with other compilers, and also to conform to some beliefs about behavior that are held by the majority of its user base.

In this case, it appears to be a conformance bug. The expression c-- should manipulate c in a way similar to c = c - 1. Here, the value of c on the right is promoted to type int, and then the subtraction takes place. Since c is in the range of int8_t, this subtraction will not overflow, but it may produce a value which is out of the range of int8_t. When this value is assigned, a conversion takes place back to the type int8_t so the result fits back into c. In the out-of-range case, the conversion has an implementation-defined value. But a value out of the range of int8_t is not a valid implementation-defined value. An implementation cannot "define" that an 8 bit type suddenly holds 9 or more bits. For the value to be implementation-defined means that something in the range of int8_t is produced, and the program continues. The C standard thereby allows for behaviors such as saturation arithmetic (common on DSP's) or wrap-around (mainstream architectures).

The compiler is using a wider underlying machine type when manipulating values of small integer types like int8_t or char. When arithmetic is performed, results which are out of range of the small integer type can be captured reliably in this wider type. To preserve the externally visible behavior that the variable is an 8 bit type, the wider result has to be truncated into the 8 bit range. Explicit code is required to do that since the machine storage locations (registers) are wider than 8 bits and happy with the larger values. Here, the compiler neglected to normalize the value and simply passed it to printf as is. The conversion specifier %i in printf has no idea that the argument originally came from int8_t calculations; it is just working with an int argument.

Solution 3 - C++

I can't fit this in a comment, so I'm posting it as an answer.

For some very odd reason, the -- operator happens to be the culprit.

I tested the code posted on Ideone and replaced c-- with c = c - 1 and the values remained within the range [-128 ... 127]:

c: -123
c: -124
c: -125
c: -126
c: -127
c: -128 // about to overflow
c: 127  // woop
c: 126
c: 125
c: 124
c: 123
c: 122

Freaky ey? I don't know much about what the compiler does to expressions like i++ or i--. It's likely promoting the return value to an int and passing it. That's the only logical conclusion I can come up with because you ARE in fact getting values that cannot fit into 8-bits.

Solution 4 - C++

I guess that the underlying hardware is still using a 32-bit register to hold that int8_t. Since the specification does not impose a behaviour for overflow, the implementation does not check for overflow and allows larger values to be stored as well.


If you mark the local variable as volatile you are forcing to use memory for it and consequently obtain the expected values within the range.

Solution 5 - C++

The assembler code reveals the problem:

:loop
mov	esi, ebx
xor	eax, eax
mov	edi, OFFSET FLAT:.LC2   ;"c: %i\n"
sub	ebx, 1
call	printf
cmp	ebx, -301
jne	loop

mov	esi, -45
mov	edi, OFFSET FLAT:.LC2   ;"c: %i\n"
xor	eax, eax
call	printf

EBX should be anded with FF post decrement, or only BL should be used with the remainder of EBX clear. Curious that it uses sub instead of dec. The -45 is flat-out mysterious. It's the bitwise inversion of 300 & 255 = 44. -45 = ~44. There's a connection somewhere.

It goes through a lot more work using c = c - 1:

mov	eax, ebx
mov	edi, OFFSET FLAT:.LC2   ;"c: %i\n"
add	ebx, 1
not	eax
movsx	ebp, al                 ;uses only the lower 8 bits
xor	eax, eax
mov	esi, ebp

It then uses only the low portion of RAX, so it's restricted to -128 thru 127. Compiler options "-g -O2".

With no optimization, it produces correct code:

movzx	eax, BYTE PTR [rbp-1]
sub	eax, 1
mov	BYTE PTR [rbp-1], al
movsx	edx, BYTE PTR [rbp-1]
mov	eax, OFFSET FLAT:.LC2   ;"c: %i\n"
mov	esi, edx

So it's a bug in the optimizer.

Solution 6 - C++

Use %hhd instead of %i! Should solve your problem.

What you see there is the result of compiler optimizations combined with you telling printf to print a 32bit number and then pushing a (supposedly 8bit) number onto the stack, which is really pointer sized, because this is how the push opcode in x86 works.

Solution 7 - C++

I think this is doing by optimization of the code:

for (int32_t i = 0; i <= 300; i++)
      printf("c: %i\n", c--);

The compilator use the int32_t i variable both for i and c. Turn off optimization or make direct cast printf("c: %i\n", (int8_t)c--);

Solution 8 - C++

c is itself defined as int8_t, but when operating ++ or -- over int8_t it is implicitly converted first to int and the result of operation instead the internal value of c is printed with printf which happens to be int.

See the actual value of c after entire loop, especially after last decrement

-301 + 256 = -45 (since it revolved entire 8 bit range once)

its the correct value which resembles the behaviour -128 + 1 = 127

c starts to use int size memory but printed as int8_t when printed as itself using only 8 bits. Utilizes all 32 bits when used as int

[Compiler Bug]

Solution 9 - C++

I think it happened because your loop will go until the int i will become 300 and c become -300. And the last value is because

printf("c: %i\n", c);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionUnsignedView Question on Stackoverflow
Solution 1 - C++user743382View Answer on Stackoverflow
Solution 2 - C++KazView Answer on Stackoverflow
Solution 3 - C++user123View Answer on Stackoverflow
Solution 4 - C++ZoltánView Answer on Stackoverflow
Solution 5 - C++user2513931View Answer on Stackoverflow
Solution 6 - C++ZottaView Answer on Stackoverflow
Solution 7 - C++VsevolodView Answer on Stackoverflow
Solution 8 - C++Izhar AazmiView Answer on Stackoverflow
Solution 9 - C++r.mirzojonovView Answer on Stackoverflow