Double cast to unsigned int on Win32 is truncating to 2,147,483,648

CVisual C++CastingX86Floating Point

C Problem Overview


Compiling the following code:

double getDouble()
{
    double value = 2147483649.0;
    return value;
}

int main()
{
     printf("INT_MAX: %u\n", INT_MAX);
     printf("UINT_MAX: %u\n", UINT_MAX);

     printf("Double value: %f\n", getDouble());
     printf("Direct cast value: %u\n", (unsigned int) getDouble());
     double d = getDouble();
     printf("Indirect cast value: %u\n", (unsigned int) d);

     return 0;
}

Outputs (MSVC x86):

INT_MAX: 2147483647
UINT_MAX: 4294967295
Double value: 2147483649.000000
Direct cast value: 2147483648
Indirect cast value: 2147483649

Outputs (MSVC x64):

INT_MAX: 2147483647
UINT_MAX: 4294967295
Double value: 2147483649.000000
Direct cast value: 2147483649
Indirect cast value: 2147483649

In Microsoft documentation there is no mention to signed integer max value in conversions from double to unsigned int.

All values above INT_MAX are being truncated to 2147483648 when it is the return of a function.

I'm using Visual Studio 2019 to build the program. This doesn't happen on gcc.

Am I doing someting wrong? Is there a safe way to convert double to unsigned int?

C Solutions


Solution 1 - C

A compiler bug...

From assembly provided by @anastaciu, the direct cast code calls __ftol2_sse, which seems to convert the number to a signed long. The routine name is ftol2_sse because this is an sse-enabled machine - but the float is in a x87 floating point register.

; Line 17
	call	_getDouble
	call	__ftol2_sse
	push	eax
	push	OFFSET ??_C@_0BH@GDLBDFEH@Direct?5cast?5value?3?5?$CFu?6@
	call	_printf
	add	esp, 8

The indirect cast on the other hand does

; Line 18
	call	_getDouble
	fstp	QWORD PTR _d$[ebp]
; Line 19
	movsd	xmm0, QWORD PTR _d$[ebp]
	call	__dtoui3
	push	eax
	push	OFFSET ??_C@_0BJ@HCKMOBHF@Indirect?5cast?5value?3?5?$CFu?6@
	call	_printf
	add	esp, 8

which pops and stores the double value to the local variable, then loads it into a SSE register and calls __dtoui3 which is a double to unsigned int conversion routine...

The behaviour of the direct cast does not conform to C89; nor does it conform to any later revision - even C89 explicitly says that:

> The remaindering operation done when a value of integral type is converted to unsigned type need not be done when a value of floating type is converted to unsigned type. Thus the range of portable values is [0, Utype_MAX + 1).


I believe the problem might be a continuation of this from 2005 - there used to be a conversion function called __ftol2 which probably would have worked for this code, i.e. it would have converted the value to a signed number -2147483647, which would have produced the correct result when interpreted an unsigned number.

Unfortunately __ftol2_sse is not a drop-in replacement for __ftol2, as it would - instead of just taking the least-significant value bits as-is - signal the out-of-range error by returning LONG_MIN / 0x80000000, which, interpreted as unsigned long here is not at all what was expected. The behaviour of __ftol2_sse would be valid for signed long, as conversion of a double a value > LONG_MAX to signed long would have undefined behaviour.

Solution 2 - C

Following @AnttiHaapala's answer, I tested the code using optimization /Ox and found that this will remove the bug as __ftol2_sse is no longer used:

//; 17   :     printf("Direct cast value: %u\n", (unsigned int)getDouble());

	push	-2147483647				//; 80000001H
	push	OFFSET $SG10116
	call	_printf

//; 18   :     double d = getDouble();
//; 19   :     printf("Indirect cast value: %u\n", (unsigned int)d);

	push	-2147483647				//; 80000001H
	push	OFFSET $SG10117
	call	_printf
	add	esp, 28					//; 0000001cH

The optimizations inlined getdouble() and added constant expression evaluation thus removing the need for a conversion at runtime making the bug go away.

Just out of curiosity, I made some more tests, namely changing the code to force float-to-int conversion at runtime. In this case the result is still correct, the compiler, with optimization, uses __dtoui3 in both conversions:

//; 19   :     printf("Direct cast value: %u\n", (unsigned int)getDouble(d));

	movsd	xmm0, QWORD PTR _d$[esp+24]
	add	esp, 12					//; 0000000cH
	call	__dtoui3
	push	eax
	push	OFFSET $SG9261
	call	_printf

//; 20   :     double db = getDouble(d);
//; 21   :     printf("Indirect cast value: %u\n", (unsigned int)db);

	movsd	xmm0, QWORD PTR _d$[esp+20]
	add	esp, 8
	call	__dtoui3
	push	eax
	push	OFFSET $SG9262
	call	_printf

However, preventing inlining, __declspec(noinline) double getDouble(){...} will bring the bug back:

//; 17   :     printf("Direct cast value: %u\n", (unsigned int)getDouble(d));

	movsd	xmm0, QWORD PTR _d$[esp+76]
	add	esp, 4
	movsd	QWORD PTR [esp], xmm0
	call	_getDouble
	call	__ftol2_sse
	push	eax
	push	OFFSET $SG9261
	call	_printf

//; 18   :     double db = getDouble(d);

	movsd	xmm0, QWORD PTR _d$[esp+80]
	add	esp, 8
	movsd	QWORD PTR [esp], xmm0
	call	_getDouble

//; 19   :     printf("Indirect cast value: %u\n", (unsigned int)db);

	call	__ftol2_sse
	push	eax
	push	OFFSET $SG9262
	call	_printf

__ftol2_sse is called in both conversions making the output 2147483648 in both situations, @zwol suspicions were correct.


Compilation details:

  • Using command line:
cl /permissive- /GS /analyze- /W3 /Gm- /Ox /sdl /D "WIN32" program.c        
  • In Visual Studio:

    • Disabling RTC in Project -> Properties -> Code Generation and setting Basic Runtime Checks to default.

    • Enabling optimization in Project -> Properties -> Optimization and setting Optimization to /Ox.

    • With debugger in x86 mode.

Solution 3 - C

Nobody has looked at the asm for MS's __ftol2_sse.

From the result, we can infer that it probably converted from x87 to signed int / long (both 32-bit types on Windows), instead of safely to uint32_t.

x86 FP -> integer instructions that overflow the integer result don't just wrap / truncate: they produce what Intel calls the "integer indefinite" when the exact value is not representable in the destination: high bit set, other bits clear. i.e. 0x80000000.

(Or if the FP invalid exception isn't masked, it fires and no value is stored. But in the default FP environment, all FP exceptions are masked. That's why for FP calculations you can get a NaN instead of a fault.)

That includes both x87 instructions like fistp (using the current rounding mode) and SSE2 instructions like cvttsd2si eax, xmm0 (using truncation toward 0, that's what the extra t means).

So it's a bug to compile double->unsigned conversion into a call to __ftol2_sse.


Side-note / tangent:

On x86-64, FP -> uint32_t can be compiled to cvttsd2si rax, xmm0, converting to a 64-bit signed destination, producing the uint32_t you want in the low half (EAX) of the integer destination.

It's C and C++ UB if the result is outside the 0..2^32-1 range so it's ok that huge positive or negative values will leave the low half of RAX (EAX) zero from the integer indefinite bit-pattern. (Unlike integer->integer conversions, modulo reduction of the value is not guaranteed. https://stackoverflow.com/questions/10541200/is-the-behaviour-of-casting-a-negative-double-to-unsigned-int-defined-in-the-c-s. To be clear, nothing in the question is undefined or even implementation-defined behaviour. I'm just pointing out that if you have FP->int64_t, you can use it to efficiently implement FP->uint32_t. That includes x87 fistp which can write a 64-bit integer destination even in 32-bit and 16-bit mode, unlike SSE2 instructions which can only directly handle 64-bit integers in 64-bit mode.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMatheus Rossi SaciottoView Question on Stackoverflow
Solution 1 - CAntti Haapala -- Слава УкраїніView Answer on Stackoverflow
Solution 2 - CanastaciuView Answer on Stackoverflow
Solution 3 - CPeter CordesView Answer on Stackoverflow