Why is (a % 256) different than (a & 0xFF)?

C++Optimization

C++ Problem Overview


I always assumed that when doing (a % 256) the optimizer would naturally use an efficient bitwise operation, as if I wrote (a & 0xFF).

When testing on compiler explorer gcc-6.2 (-O3):

// Type your code here, or load an example.
int mod(int num) {
    return num % 256;
}

mod(int):
    mov     edx, edi
    sar     edx, 31
    shr     edx, 24
    lea     eax, [rdi+rdx]
    movzx   eax, al
    sub     eax, edx
    ret

And when trying the other code:

// Type your code here, or load an example.
int mod(int num) {
    return num & 0xFF;
}

mod(int):
    movzx   eax, dil
    ret

Seems like I'm completely missing something out. Any ideas?

C++ Solutions


Solution 1 - C++

It's not the same. Try num = -79, and you will get different results from both operations. (-79) % 256 = -79, while (-79) & 0xff is some positive number.

Using unsigned int, the operations are the same, and the code will likely be the same.

PS- Someone commented

> They shouldn't be the same, a % b is defined as a - b * floor (a / b).

That's not how it is defined in C, C++, Objective-C (ie all the languages where the code in the question would compile).

Solution 2 - C++

Short answer

-1 % 256 yields -1 and not 255 which is -1 & 0xFF. Therefore, the optimization would be incorrect.

Long answer

C++ has the convention that (a/b)*b + a%b == a, which seems quite natural. a/b always returns the arithmetic result without the fractional part (truncating towards 0). As a consequence, a%b has the same sign as a or is 0.

The division -1/256 yields 0 and hence -1%256 must be -1 in order to satisfy the above condition ((-1%256)*256 + -1%256 == -1). This is obviously different from -1&0xFF which is 0xFF. Therefore, the compiler cannot optimize the way you want.

The relevant section in the C++ standard [expr.mul §4] as of N4606 states:

> For integral operands the / operator yields the algebraic quotient with any fractional part discarded; if the quotient a/b is representable in the type of the result, (a/b)*b + a%b is equal to a [...].

Enabling the optimization

However, using unsigned types, the optimization would be completely correct, satisfying the above convention:

unsigned(-1)%256 == 0xFF

See also this.

Other languages

This is handled very different across different programming languages as you can look up on Wikipedia.

Solution 3 - C++

Since C++11, num % 256 has to be non-positive if num is negative.

So the bit pattern would depend on the implementation of signed types on your system: for a negative first argument, the result is not the extraction of the least significant 8 bits.

It would be a different matter if num in your case was unsigned: these days I'd almost expect a compiler to make the optimisation that you cite.

Solution 4 - C++

I don't have telepathic insight into the compiler's reasoning, but in the case of % there is the necessity of dealing with negative values (and division rounds towards zero), while with & the result is always the lower 8 bits.

The sar instruction sounds to me like "shift arithmetic right", filling up the vacated bits with the sign bit value.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionElad WeissView Question on Stackoverflow
Solution 1 - C++gnasher729View Answer on Stackoverflow
Solution 2 - C++Ralph TandetzkyView Answer on Stackoverflow
Solution 3 - C++BathshebaView Answer on Stackoverflow
Solution 4 - C++Cheers and hth. - AlfView Answer on Stackoverflow