How to avoid overflow in expr. A * B - C * D

C++CInteger Overflow

C++ Problem Overview


I need to compute an expression which looks like: A*B - C*D, where their types are: signed long long int A, B, C, D; Each number can be really big (not overflowing its type). While A*B could cause overflow, at same time expression A*B - C*D can be really small. How can I compute it correctly?

For example: MAX * MAX - (MAX - 1) * (MAX + 1) == 1, where MAX = LLONG_MAX - n and n - some natural number.

C++ Solutions


Solution 1 - C++

This seems too trivial I guess. But A*B is the one that could overflow.

You could do the following, without losing precision

A*B - C*D = A(D+E) - (A+F)D
          = AD + AE - AD - DF
          = AE - DF
             ^smaller quantities E & F

E = B - D (hence, far smaller than B)
F = C - A (hence, far smaller than C)

This decomposition can be done further.
As @Gian pointed out, care might need to be taken during the subtraction operation if the type is unsigned long long.


For example, with the case you have in the question, it takes just one iteration,

 MAX * MAX - (MAX - 1) * (MAX + 1)
  A     B       C           D

E = B - D = -1
F = C - A = -1

AE - DF = {MAX * -1} - {(MAX + 1) * -1} = -MAX + MAX + 1 = 1

                                          

Solution 2 - C++

The simplest and most general solution is to use a representation that can't overflow, either by using a long integer library (e.g. http://gmplib.org/) or representing using a struct or array and implementing a kind of long multiplication (i.e. separating each number to two 32bit halves and performing the multiplication as below:

(R1 + R2 * 2^32 + R3 * 2^64 + R4 * 2^96) = R = A*B = (A1 + A2 * 2^32) * (B1 + B2 * 2^32) 
R1 = (A1*B1) % 2^32
R2 = ((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) % 2^32
R3 = (((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) / 2^32 + (A1*B2) / 2^32 + (A2*B1) / 2^32 + (A2*B2) % 2^32) %2^32
R4 = ((((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) / 2^32 + (A1*B2) / 2^32 + (A2*B1) / 2^32 + (A2*B2) % 2^32) / 2^32) + (A2*B2) / 2^32

Assuming the end result fits in 64 bits you actually don't really need most bits of R3 and none of R4

Solution 3 - C++

Note that this is not standard since it relies on wrap-around signed-overflow. (GCC has compiler flags which enable this.)

But if you just do all the calculations in long long, the result of applying the formula directly:
(A * B - C * D) will be accurate as long as the correct result fits into a long long.


Here's a work-around that only relies on implementation-defined behavior of casting unsigned integer to signed integer. But this can be expected to work on almost every system today.

(long long)((unsigned long long)A * B - (unsigned long long)C * D)

This casts the inputs to unsigned long long where the overflow behavior is guaranteed to be wrap-around by the standard. Casting back to a signed integer at the end is the implementation-defined part, but will work on nearly all environments today.


If you need more pedantic solution, I think you have to use "long arithmetic"

Solution 4 - C++

This should work ( I think ):

signed long long int a = 0x7ffffffffffffffd;
signed long long int b = 0x7ffffffffffffffd;
signed long long int c = 0x7ffffffffffffffc;
signed long long int d = 0x7ffffffffffffffe;
signed long long int bd = b / d;
signed long long int bdmod = b % d;
signed long long int ca = c / a;
signed long long int camod = c % a;
signed long long int x = (bd - ca) * a * d - (camod * d - bdmod * a);

Here's my derivation:

x = a * b - c * d
x / (a * d) = (a * b - c * d) / (a * d)
x / (a * d) = b / d - c / a

now, the integer/mod stuff:
x / (a * d) = (b / d + ( b % d ) / d) - (c / a + ( c % a ) / a )
x / (a * d) = (b / d - c / a) - ( ( c % a ) / a - ( b % d ) / d)
x = (b / d - c / a) * a * d - ( ( c % a ) * d - ( b % d ) * a)

Solution 5 - C++

E = max(A,B,C,D)
A1 = A -E;
B1 = B -E;
C1 = C -E;
D1 = D -E;

then

A*B - C*D = (A1+E)*(B1+E)-(C1+E)(D1+E) = (A1+B1-C1-D1)*E + A1*B1 -C1*D1

Solution 6 - C++

You could consider computing a greatest common factor for all your values, and then dividing them by that factor before doing your arithmetic operations, then multiplying again. This assumes that such a factor exists, however (for example, if A, B, C and D happen to be relatively prime, they won't have a common factor).

Similarly, you could consider working on log-scales, but this is going to be a little scary, subject to numerical precision.

Solution 7 - C++

If the result fits in a long long int then the expression AB-CD is okay as it performs the arithmetic mod 2^64, and will give the correct result. The problem is to know if the result fits in a long long int. To detect this, you can use the following trick using doubles:

if( abs( (double)A*B - (double)C*D ) > MAX_LLONG ) 
    Overflow
else 
    return A*B-C*D;

The problem with this approach is that you are limited by the precision of the mantissa of the doubles (54bits?) so you need to limit the products AB and CD to 63+54 bits (or probably a little less).

Solution 8 - C++

You could write each number in an array, each element being a digit and do the calculations as polynomials. Take the resulting polynomial, which is an array, and compute the result by multiplying each element of the array with 10 to the power of the position in the array (the first position being the largest and the last being zero).

The number 123 can be expressed as:

123 = 100 * 1 + 10 * 2 + 3

for which you just create an array [1 2 3].

You do this for all numbers A, B, C and D, and then you multiply them as polynomials. Once you have the resulting polynomial, you just reconstruct the number from it.

Solution 9 - C++

While a signed long long int will not hold A*B, two of them will. So A*B could be decomposed to tree terms of different exponent, any of them fitting one signed long long int.

A1=A>>32;
A0=A & 0xffffffff;
B1=B>>32;
B0=B & 0xffffffff;

AB_0=A0*B0;
AB_1=A0*B1+A1*B0;
AB_2=A1*B1;

Same for C*D.

Folowing the straight way, the subraction could be done to every pair of AB_i and CD_i likewise, using an additional carry bit (accurately a 1-bit integer) for each. So if we say E=AB-CD you get something like:

E_00=AB_0-CD_0 
E_01=(AB_0 > CD_0) == (AB_0 - CD_0 < 0) ? 0 : 1  // carry bit if overflow
E_10=AB_1-CD_1 
...

We continue by transferring the upper-half of E_10 to E_20 (shift by 32 and add, then erase upper half of E_10).

Now you can get rid of the carry bit E_11 by adding it with the right sign (obtained from the non-carry part) to E_20. If this triggers an overflow, the result wouldn't fit either.

E_10 now has enough 'space' to take the upper half from E_00 (shift, add, erase) and the carry bit E_01.

E_10 may be larger now again, so we repeat the transfer to E_20.

At this point, E_20 must become zero, otherwise the result won't fit. The upper half of E_10 is empty as result of the transfer too.

The final step is to transfer the lower half of E_20 into E_10 again.

If the expectation that E=A*B+C*D would fit the signed long long int holds, we now have

E_20=0
E_10=0
E_00=E

Solution 10 - C++

If you know the final result is representable in your integer type, you can perform this calculation quickly using the code below. Because the C standard specifies that unsigned arithmetic is modulo arithmetic and does not overflow, you can use an unsigned type to perform the calculation.

The following code assumes there is an unsigned type of the same width and that the signed type uses all bit patterns to represent values (no trap representations, the minimum of the signed type is the negative of half the modulus of the unsigned type). If this does not hold in a C implementation, simple adjustments can be made to the ConvertToSigned routine for that.

The following uses signed char and unsigned char to demonstrate the code. For your implementation, change the definition of Signed to typedef signed long long int Signed; and the definition of Unsigned to typedef unsigned long long int Unsigned;.

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>


//	Define the signed and unsigned types we wish to use.
typedef	signed char   Signed;
typedef	unsigned char Unsigned;

//	uHalfModulus is half the modulus of the unsigned type.
static const Unsigned uHalfModulus = UCHAR_MAX/2+1;

//	sHalfModulus is the negation of half the modulus of the unsigned type.
static const Signed   sHalfModulus = -1 - (Signed) (UCHAR_MAX/2);


/*	Map the unsigned value to the signed value that is the same modulo the
	modulus of the unsigned type.  If the input x maps to a positive value, we
	simply return x.  If it maps to a negative value, we return x minus the
	modulus of the unsigned type.

	In most C implementations, this routine could simply be "return x;".
	However, this version uses several steps to convert x to a negative value
	so that overflow is avoided.
*/
static Signed ConvertToSigned(Unsigned x)
{
	/*	If x is representable in the signed type, return it.  (In some
		implementations, 
	*/
	if (x < uHalfModulus)
		return x;

	/*	Otherwise, return x minus the modulus of the unsigned type, taking
		care not to overflow the signed type.
	*/
	return (Signed) (x - uHalfModulus) - sHalfModulus;
}


/*	Calculate A*B - C*D given that the result is representable as a Signed
	value.
*/
static signed char Calculate(Signed A, Signed B, Signed C, Signed D)
{
	/*	Map signed values to unsigned values.  Positive values are unaltered.
		Negative values have the modulus of the unsigned type added.  Because
		we do modulo arithmetic below, adding the modulus does not change the
		final result.
	*/
	Unsigned a = A;
	Unsigned b = B;
	Unsigned c = C;
	Unsigned d = D;

	//	Calculate with modulo arithmetic.
	Unsigned t = a*b - c*d;

	//	Map the unsigned value to the corresponding signed value.
	return ConvertToSigned(t);
}


int main()
{
	//	Test every combination of inputs for signed char.
	for (int A = SCHAR_MIN; A <= SCHAR_MAX; ++A)
	for (int B = SCHAR_MIN; B <= SCHAR_MAX; ++B)
	for (int C = SCHAR_MIN; C <= SCHAR_MAX; ++C)
	for (int D = SCHAR_MIN; D <= SCHAR_MAX; ++D)
	{
		//	Use int to calculate the expected result.
		int t0 = A*B - C*D;

		//	If the result is not representable in signed char, skip this case.
		if (t0 < SCHAR_MIN || SCHAR_MAX < t0)
			continue;

		//	Calculate the result with the sample code.
		int t1 = Calculate(A, B, C, D);

		//	Test the result for errors.
		if (t0 != t1)
		{
			printf("%d*%d - %d*%d = %d, but %d was returned.\n",
				A, B, C, D, t0, t1);
			exit(EXIT_FAILURE);
		}
	}
    return 0;
}

Solution 11 - C++

You could try breaking the equation into smaller components which don't overflow.

AB - CD
= [ A(B - N) - C( D - M )] + [AN - CM]

= ( AK - CJ ) + ( AN - CM)

	where K = B - N
	      J = D - M

If the components still overflow you could break them into smaller components recursively and then recombine.

Solution 12 - C++

I may not have covered all of the edge cases, nor have I rigorously tested this but this implements a technique I remember using in the 80s when trying to do 32-bit integer maths on a 16-bit cpu. Essentially you split the 32 bits into two 16-bit units and work with them separately.

public class DoubleMaths {
  private static class SplitLong {
    // High half (or integral part).
    private final long h;
    // Low half.
    private final long l;
    // Split.
    private static final int SPLIT = (Long.SIZE / 2);
    
    // Make from an existing pair.
    private SplitLong(long h, long l) {
      // Let l overflow into h.
      this.h = h + (l >> SPLIT);
      this.l = l % (1l << SPLIT);
    }
    
    public SplitLong(long v) {
      h = v >> SPLIT;
      l = v % (1l << SPLIT);
    }
    
    public long longValue() {
      return (h << SPLIT) + l;
    }
    
    public SplitLong add ( SplitLong b ) {
      // TODO: Check for overflow.
      return new SplitLong ( longValue() + b.longValue() );
    }
    
    public SplitLong sub ( SplitLong b ) {
      // TODO: Check for overflow.
      return new SplitLong ( longValue() - b.longValue() );
    }
    
    public SplitLong mul ( SplitLong b ) {
      /*
       * e.g. 10 * 15 = 150
       * 
       * Divide 10 and 15 by 5
       * 
       * 2 * 3 = 5
       * 
       * Must therefore multiply up by 5 * 5 = 25
       * 
       * 5 * 25 = 150
       */
      long lbl = l * b.l;
      long hbh = h * b.h;
      long lbh = l * b.h;
      long hbl = h * b.l;
      return new SplitLong ( lbh + hbl, lbl + hbh );
    }
    
    @Override
    public String toString () {
      return Long.toHexString(h)+"|"+Long.toHexString(l);
    }
  }
  
  // I'll use long and int but this can apply just as easily to long-long and long.
  // The aim is to calculate A*B - C*D without overflow.
  static final long A = Long.MAX_VALUE;
  static final long B = Long.MAX_VALUE - 1;
  static final long C = Long.MAX_VALUE;
  static final long D = Long.MAX_VALUE - 2;
  
  public static void main(String[] args) throws InterruptedException {
    // First do it with BigIntegers to get what the result should be.
    BigInteger a = BigInteger.valueOf(A);
    BigInteger b = BigInteger.valueOf(B);
    BigInteger c = BigInteger.valueOf(C);
    BigInteger d = BigInteger.valueOf(D);
    BigInteger answer = a.multiply(b).subtract(c.multiply(d));
    System.out.println("A*B - C*D = "+answer+" = "+answer.toString(16));
    
    // Make one and test its integrity.
    SplitLong sla = new SplitLong(A);
    System.out.println("A="+Long.toHexString(A)+" ("+sla.toString()+") = "+Long.toHexString(sla.longValue()));
    
    // Start small.
    SplitLong sl10 = new SplitLong(10);
    SplitLong sl15 = new SplitLong(15);
    SplitLong sl150 = sl10.mul(sl15);
    System.out.println("10="+sl10.longValue()+"("+sl10.toString()+") * 15="+sl15.longValue()+"("+sl15.toString()+") = "+sl150.longValue() + " ("+sl150.toString()+")");

    // The real thing.
    SplitLong slb = new SplitLong(B);
    SplitLong slc = new SplitLong(C);
    SplitLong sld = new SplitLong(D);
    System.out.println("B="+Long.toHexString(B)+" ("+slb.toString()+") = "+Long.toHexString(slb.longValue()));
    System.out.println("C="+Long.toHexString(C)+" ("+slc.toString()+") = "+Long.toHexString(slc.longValue()));
    System.out.println("D="+Long.toHexString(D)+" ("+sld.toString()+") = "+Long.toHexString(sld.longValue()));
    SplitLong sanswer = sla.mul(slb).sub(slc.mul(sld));
    System.out.println("A*B - C*D = "+sanswer+" = "+sanswer.longValue());
    
  }

}

Prints:

A*B - C*D = 9223372036854775807 = 7fffffffffffffff
A=7fffffffffffffff (7fffffff|ffffffff) = 7fffffffffffffff
10=10(0|a) * 15=15(0|f) = 150 (0|96)
B=7ffffffffffffffe (7fffffff|fffffffe) = 7ffffffffffffffe
C=7fffffffffffffff (7fffffff|ffffffff) = 7fffffffffffffff
D=7ffffffffffffffd (7fffffff|fffffffd) = 7ffffffffffffffd
A*B - C*D = 7fffffff|ffffffff = 9223372036854775807

which looks to me like it's working.

I bet I've missed some of the subtleties such as watching for sign overflow etc. but I think the essence is there.

Solution 13 - C++

For the sake of completeness, since no one mentioned it, some compilers (e.g. GCC) actually provide you with a 128 bit integer nowadays.

Thus an easy solution could be:

(long long)((__int128)A * B - (__int128)C * D)

Solution 14 - C++

AB-CD = (AB-CD) * AC / AC = (B/C-D/A)*A*C. Neither B/C nor D/A can overflow, so calculate (B/C-D/A) first. Since the final result won't overflow according to your definition, you can safely perform the remaining multiplications and calculate (B/C-D/A)*A*C which is the required result.

Note, if your input can be extremely small as well, the B/C or D/A can overflow. If it's possible, more complex manipulations might be required according to input inspection.

Solution 15 - C++

Choose K = a big number (eg. K = A - sqrt(A))

A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A-C+B-D); // Avoid overflow.

Why?

(A-K)*(B-K) = A*B - K*(A+B) + K^2
(C-K)*(D-K) = C*D - K*(C+D) + K^2

=>
(A-K)*(B-K) - (C-K)*(D-K) = A*B - K*(A+B) + K^2 - {C*D - K*(C+D) + K^2}
(A-K)*(B-K) - (C-K)*(D-K) = A*B - C*D - K*(A+B) + K*(C+D) + K^2 - K^2
(A-K)*(B-K) - (C-K)*(D-K) = A*B - C*D - K*(A+B-C-D)

=>
A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A+B-C-D)

=>
A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A-C+B-D)

Note that Because A, B, C and D are big numbers, thus A-C and B-D are small numbers.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNGixView Question on Stackoverflow
Solution 1 - C++Anirudh RamanathanView Answer on Stackoverflow
Solution 2 - C++OfirView Answer on Stackoverflow
Solution 3 - C++RiaDView Answer on Stackoverflow
Solution 4 - C++paquetpView Answer on Stackoverflow
Solution 5 - C++landryView Answer on Stackoverflow
Solution 6 - C++GianView Answer on Stackoverflow
Solution 7 - C++Esteban CrespiView Answer on Stackoverflow
Solution 8 - C++MihaiView Answer on Stackoverflow
Solution 9 - C++dronusView Answer on Stackoverflow
Solution 10 - C++Eric PostpischilView Answer on Stackoverflow
Solution 11 - C++bradgonesurfingView Answer on Stackoverflow
Solution 12 - C++OldCurmudgeonView Answer on Stackoverflow
Solution 13 - C++i Code 4 FoodView Answer on Stackoverflow
Solution 14 - C++SomeWittyUsernameView Answer on Stackoverflow
Solution 15 - C++Amir SaniyanView Answer on Stackoverflow