How did I get a value larger than 8 bits in size from an 8-bit integer?
C++GccUndefined BehaviorC++ Problem Overview
I tracked down an extremely nasty bug hiding behind this little gem. I am aware that per the C++ spec, signed overflows are undefined behavior, but only when the overflow occurs when the value is extended to bit-width sizeof(int)
. As I understand it, incrementing a char
shouldn't ever be undefined behavior as long as sizeof(char) < sizeof(int)
. But that doesn't explain how c
is getting an impossible value. As an 8-bit integer, how can c
hold values greater than its bit-width?
##Code
// Compiled with gcc-4.7.2
#include <cstdio>
#include <stdint.h>
#include <climits>
int main()
{
int8_t c = 0;
printf("SCHAR_MIN: %i\n", SCHAR_MIN);
printf("SCHAR_MAX: %i\n", SCHAR_MAX);
for (int32_t i = 0; i <= 300; i++)
printf("c: %i\n", c--);
printf("c: %i\n", c);
return 0;
}
##Output
SCHAR_MIN: -128
SCHAR_MAX: 127
c: 0
c: -1
c: -2
c: -3
...
c: -127
c: -128 // <= The next value should still be an 8-bit value.
c: -129 // <= What? That's more than 8 bits!
c: -130 // <= Uh...
c: -131
...
c: -297
c: -298 // <= Getting ridiculous now.
c: -299
c: -300
c: -45 // <= ..........
C++ Solutions
Solution 1 - C++
Although getting impossible results for undefined behaviour is a valid consequence, there is actually no undefined behaviour in your code. What's happening is that the compiler thinks the behaviour is undefined, and optimises accordingly.
If c
is defined as int8_t
, and int8_t
promotes to int
, then c--
is supposed to perform the subtraction c - 1
in int
arithmetic and convert the result back to int8_t
. The subtraction in int
does not overflow, and converting out-of-range integral values to another integral type is valid. If the destination type is signed, the result is implementation-defined, but it must be a valid value for the destination type. (And if the destination type is unsigned, the result is well-defined, but that does not apply here.)
Solution 2 - C++
A compiler can have bugs which are other than nonconformances to the standard, because there are other requirements. A compiler should be compatible with other versions of itself. It may also be expected to be compatible in some ways with other compilers, and also to conform to some beliefs about behavior that are held by the majority of its user base.
In this case, it appears to be a conformance bug. The expression c--
should manipulate c
in a way similar to c = c - 1
. Here, the value of c
on the right is promoted to type int
, and then the subtraction takes place. Since c
is in the range of int8_t
, this subtraction will not overflow, but it may produce a value which is out of the range of int8_t
. When this value is assigned, a conversion takes place back to the type int8_t
so the result fits back into c
. In the out-of-range case, the conversion has an implementation-defined value. But a value out of the range of int8_t
is not a valid implementation-defined value. An implementation cannot "define" that an 8 bit type suddenly holds 9 or more bits. For the value to be implementation-defined means that something in the range of int8_t
is produced, and the program continues. The C standard thereby allows for behaviors such as saturation arithmetic (common on DSP's) or wrap-around (mainstream architectures).
The compiler is using a wider underlying machine type when manipulating values of small integer types like int8_t
or char
. When arithmetic is performed, results which are out of range of the small integer type can be captured reliably in this wider type. To preserve the externally visible behavior that the variable is an 8 bit type, the wider result has to be truncated into the 8 bit range. Explicit code is required to do that since the machine storage locations (registers) are wider than 8 bits and happy with the larger values. Here, the compiler neglected to normalize the value and simply passed it to printf
as is. The conversion specifier %i
in printf
has no idea that the argument originally came from int8_t
calculations; it is just working with an int
argument.
Solution 3 - C++
I can't fit this in a comment, so I'm posting it as an answer.
For some very odd reason, the --
operator happens to be the culprit.
I tested the code posted on Ideone and replaced c--
with c = c - 1
and the values remained within the range [-128 ... 127]:
c: -123
c: -124
c: -125
c: -126
c: -127
c: -128 // about to overflow
c: 127 // woop
c: 126
c: 125
c: 124
c: 123
c: 122
Freaky ey? I don't know much about what the compiler does to expressions like i++
or i--
. It's likely promoting the return value to an int
and passing it. That's the only logical conclusion I can come up with because you ARE in fact getting values that cannot fit into 8-bits.
Solution 4 - C++
I guess that the underlying hardware is still using a 32-bit register to hold that int8_t. Since the specification does not impose a behaviour for overflow, the implementation does not check for overflow and allows larger values to be stored as well.
If you mark the local variable as volatile
you are forcing to use memory for it and consequently obtain the expected values within the range.
Solution 5 - C++
The assembler code reveals the problem:
:loop
mov esi, ebx
xor eax, eax
mov edi, OFFSET FLAT:.LC2 ;"c: %i\n"
sub ebx, 1
call printf
cmp ebx, -301
jne loop
mov esi, -45
mov edi, OFFSET FLAT:.LC2 ;"c: %i\n"
xor eax, eax
call printf
EBX should be anded with FF post decrement, or only BL should be used with the remainder of EBX clear. Curious that it uses sub instead of dec. The -45 is flat-out mysterious. It's the bitwise inversion of 300 & 255 = 44. -45 = ~44. There's a connection somewhere.
It goes through a lot more work using c = c - 1:
mov eax, ebx
mov edi, OFFSET FLAT:.LC2 ;"c: %i\n"
add ebx, 1
not eax
movsx ebp, al ;uses only the lower 8 bits
xor eax, eax
mov esi, ebp
It then uses only the low portion of RAX, so it's restricted to -128 thru 127. Compiler options "-g -O2".
With no optimization, it produces correct code:
movzx eax, BYTE PTR [rbp-1]
sub eax, 1
mov BYTE PTR [rbp-1], al
movsx edx, BYTE PTR [rbp-1]
mov eax, OFFSET FLAT:.LC2 ;"c: %i\n"
mov esi, edx
So it's a bug in the optimizer.
Solution 6 - C++
Use %hhd
instead of %i
! Should solve your problem.
What you see there is the result of compiler optimizations combined with you telling printf to print a 32bit number and then pushing a (supposedly 8bit) number onto the stack, which is really pointer sized, because this is how the push opcode in x86 works.
Solution 7 - C++
I think this is doing by optimization of the code:
for (int32_t i = 0; i <= 300; i++)
printf("c: %i\n", c--);
The compilator use the int32_t i
variable both for i
and c
. Turn off optimization or make direct cast printf("c: %i\n", (int8_t)c--);
Solution 8 - C++
c
is itself defined as int8_t
, but when operating ++
or --
over int8_t
it is implicitly converted first to int
and the result of operation instead the internal value of c is printed with printf which happens to be int
.
See the actual value of c
after entire loop, especially after last decrement
-301 + 256 = -45 (since it revolved entire 8 bit range once)
its the correct value which resembles the behaviour -128 + 1 = 127
c
starts to use int
size memory but printed as int8_t
when printed as itself using only 8 bits
. Utilizes all 32 bits
when used as int
[Compiler Bug]
Solution 9 - C++
I think it happened because your loop will go until the int i will become 300 and c become -300. And the last value is because
printf("c: %i\n", c);