Unexpected results when working with very big integers on interpreted languages
Phpnode.jsPrecisionInteger OverflowInteger ArithmeticPhp Problem Overview
I am trying to get the sum of 1 + 2 + ... + 1000000000
, but I'm getting funny results in PHP and Node.js.
PHP
$sum = 0;
for($i = 0; $i <= 1000000000 ; $i++) {
$sum += $i;
}
printf("%s", number_format($sum, 0, "", "")); // 500000000067108992
Node.js
var sum = 0;
for (i = 0; i <= 1000000000; i++) {
sum += i ;
}
console.log(sum); // 500000000067109000
The correct answer can be calculated using
1 + 2 + ... + n = n(n+1)/2
Correct answer = 500000000500000000, so I decided to try another language.
GO
var sum , i int64
for i = 0 ; i <= 1000000000; i++ {
sum += i
}
fmt.Println(sum) // 500000000500000000
But it works fine! So what is wrong with my PHP and Node.js code?
Perhaps this a problem of interpreted languages, and that's why it works in a compiled language like Go? If so, would other interpreted languages such as Python and Perl have the same problem?
Php Solutions
Solution 1 - Php
Python works:
>>> sum(x for x in xrange(1000000000 + 1))
500000000500000000
Or:
>>> sum(xrange(1000000000+1))
500000000500000000
Python's int
auto promotes to a Python long
which supports arbitrary precision. It will produce the correct answer on 32 or 64 bit platforms.
This can be seen by raising 2 to a power far greater than the bit width of the platform:
>>> 2**99
633825300114114700748351602688L
You can demonstrate (with Python) that the erroneous values you are getting in PHP is because PHP is promoting to a float when the values are greater than 2**32-1:
>>> int(sum(float(x) for x in xrange(1000000000+1)))
500000000067108992
Solution 2 - Php
Your Go code uses integer arithmetic with enough bits to give an exact answer. Never touched PHP or Node.js, but from the results I suspect the math is done using floating point numbers and should be thus expected not to be exact for numbers of this magnitude.
Solution 3 - Php
The reason is that the value of your integer variable sum
exceeds the maximum value. And the sum
you get is result of float-point arithmetic which involves rounding off. Since other answers did not mention the exact limits, I decided to post it.
The max integer value for PHP for:
- 32-bit version is 2147483647
- 64-bit version is 9223372036854775807
So it means either you are using 32 bit CPU or 32 bit OS or 32 bit compiled version of PHP. It can be found using PHP_INT_MAX
. The sum
would be calculated correctly if you do it on a 64 bit machine.
The max integer value in JavaScript is 9007199254740992. The largest exact integral value you can work with is 253 (taken from this question). The sum
exceeds this limit.
If the integer value does not exceed these limits, then you are good. Otherwise you will have to look for arbitrary precision integer libraries.
Solution 4 - Php
Here is the answer in C, for completeness:
#include <stdio.h>
int main(void)
{
unsigned long long sum = 0, i;
for (i = 0; i <= 1000000000; i++) //one billion
sum += i;
printf("%llu\n", sum); //500000000500000000
return 0;
}
The key in this case is using C99's long long
data type. It provides the biggest primitive storage C can manage and it runs really, really fast. The long long
type will also work on most any 32 or 64-bit machine.
There is one caveat: compilers provided by Microsoft explicitly do not support the 14 year-old C99 standard, so getting this to run in Visual Studio is a crapshot.
Solution 5 - Php
My guess is that when the sum exceeds the capacity of a native int
(231-1 = 2,147,483,647), Node.js and PHP switch to a floating point representation and you start getting round-off errors. A language like Go will probably try to stick with an integer form (e.g., 64-bit integers) as long as possible (if, indeed, it didn't start with that). Since the answer fits in a 64-bit integer, the computation is exact.
Solution 6 - Php
Perl script give us the expected result:
use warnings;
use strict;
my $sum = 0;
for(my $i = 0; $i <= 1_000_000_000; $i++) {
$sum += $i;
}
print $sum, "\n"; #<-- prints: 500000000500000000
Solution 7 - Php
The Answer to this is "surprisingly" simple:
First - as most of you might know - a 32-bit integer ranges from −2,147,483,648 to 2,147,483,647. So, what happens if PHP gets a result, that is LARGER than this?
Usually, one would expect a immediate "Overflow", causing 2,147,483,647 + 1 to turn into −2,147,483,648. However, that is NOT the case. IF PHP Encounters a larger number, it Returns FLOAT instead of INT.
> If PHP encounters a number beyond the bounds of the integer type, it will be interpreted as a float instead. Also, an operation which results in a number beyond the bounds of the integer type will return a float instead.
http://php.net/manual/en/language.types.integer.php
This said, and knowing that PHP FLOAT implementation is following the IEEE 754 double precision Format, means, that PHP is able to deal with numbers upto 52 bit, without loosing precision. (On a 32-bit System)
So, at the Point, where your Sum hits 9,007,199,254,740,992 (which is 2^53) The Float value returned by the PHP Maths will no longer be precise enough.
E:\PHP>php -r "$x=bindec(\"100000000000000000000000000000000000000000000000000000\"); echo number_format($x,0);"
> 9,007,199,254,740,992
E:\PHP>php -r "$x=bindec(\"100000000000000000000000000000000000000000000000000001\"); echo number_format($x,0);"
> 9,007,199,254,740,992
E:\PHP>php -r "$x=bindec(\"100000000000000000000000000000000000000000000000000010\"); echo number_format($x,0);"
> 9,007,199,254,740,994
This example Shows the Point, where PHP is loosing precision. First, the last significatn bit will be dropped, causing the first 2 expressions to result in an equal number - which they aren't.
From NOW ON, the whole math will go wrong, when working with default data-types.
> •Is it the same problem for other interpreted language such as Python or Perl?
I don't think so. I think this is a problem of languages that have no type-safety. While a Integer Overflow as mentioned above WILL happen in every language that uses fixed data types, the languages without type-safety might try to catch this with other datatypes. However, once they hit their "natural" (System-given) Border - they might return anything, but the right result.
However, each language may have different threadings for such a Scenario.
Solution 8 - Php
The other answers already explained what is happening here (floating point precision as usual).
One solution is to use an integer type big enough, or to hope the language will chose one if needed.
The other solution is to use a summation algorithm that knows about the precision problem and works around it. Below you find the same summation, first with with 64 bit integer, then with 64 bit floating point and then using floating point again, but with the Kahan summation algorithm.
Written in C#, but the same holds for other languages, too.
long sum1 = 0;
for (int i = 0; i <= 1000000000; i++)
{
sum1 += i ;
}
Console.WriteLine(sum1.ToString("N0"));
// 500.000.000.500.000.000
double sum2 = 0;
for (int i = 0; i <= 1000000000; i++)
{
sum2 += i ;
}
Console.WriteLine(sum2.ToString("N0"));
// 500.000.000.067.109.000
double sum3 = 0;
double error = 0;
for (int i = 0; i <= 1000000000; i++)
{
double corrected = i - error;
double temp = sum3 + corrected;
error = (temp - sum3) - corrected;
sum3 = temp;
}
Console.WriteLine(sum3.ToString("N0"));
//500.000.000.500.000.000
The Kahan summation gives a beautiful result. It does of course take a lot longer to compute. Whether you want to use it depends a) on your performance vs. precision needs, and b) how your language handles integer vs. floating point data types.
Solution 9 - Php
If you have 32-Bit PHP, you can calculate it with bc:
<?php
$value = 1000000000;
echo bcdiv( bcmul( $value, $value + 1 ), 2 );
//500000000500000000
In Javascript you have to use arbitrary number library, for example BigInteger:
var value = new BigInteger(1000000000);
console.log( value.multiply(value.add(1)).divide(2).toString());
//500000000500000000
Even with languages like Go and Java you will eventually have to use arbitrary number library, your number just happened to be small enough for 64-bit but too high for 32-bit.
Solution 10 - Php
In Ruby:
sum = 0
1.upto(1000000000).each{|i|
sum += i
}
puts sum
Prints 500000000500000000
, but takes a good 4 minutes on my 2.6 GHz Intel i7.
Magnuss and Jaunty have a much more Ruby solution:
1.upto(1000000000).inject(:+)
To run a benchmark:
$ time ruby -e "puts 1.upto(1000000000).inject(:+)"
ruby -e "1.upto(1000000000).inject(:+)" 128.75s user 0.07s system 99% cpu 2:08.84 total
Solution 11 - Php
I use node-bigint for big integer stuff:
https://github.com/substack/node-bigint
var bigint = require('bigint');
var sum = bigint(0);
for(var i = 0; i <= 1000000000; i++) {
sum = sum.add(i);
}
console.log(sum);
It's not as quick as something that can use native 64-bit stuff for this exact test, but if you get into bigger numbers than 64-bit, it uses libgmp under the hood, which is one of the faster arbitrary precision libraries out there.
Solution 12 - Php
took ages in ruby, but gives the correct answer:
(1..1000000000).reduce(:+)
=> 500000000500000000
Solution 13 - Php
To get the correct result in php I think you'd need to use the BC math operators: http://php.net/manual/en/ref.bc.php
Here is the correct answer in Scala. You have to use Longs otherwise you overflow the number:
println((1L to 1000000000L).reduce(_ + _)) // prints 500000000500000000
Solution 14 - Php
There's actually a cool trick to this problem.
Assume it was 1-100 instead.
1 + 2 + 3 + 4 + ... + 50 +
100 + 99 + 98 + 97 + ... + 51
= (101 + 101 + 101 + 101 + ... + 101) = 101*50
Formula:
For N= 100: Output = N/2*(N+1)
For N = 1e9: Output = N/2*(N+1)
This is much faster than looping through all of that data. Your processor will thank you for it. And here is an interesting story regarding this very problem:
Solution 15 - Php
This gives the proper result in PHP by forcing the integer cast.
$sum = (int) $sum + $i;
Solution 16 - Php
Common Lisp is one of the fastest interpreted* languages and handles arbitrarily large integers correctly by default. This takes about 3 second with SBCL:
* (time (let ((sum 0)) (loop :for x :from 1 :to 1000000000 :do (incf sum x)) sum))
Evaluation took:
3.068 seconds of real time
3.064000 seconds of total run time (3.044000 user, 0.020000 system)
99.87% CPU
8,572,036,182 processor cycles
0 bytes consed
500000000500000000
- By interpreted, I mean, I ran this code from the REPL, SBCL may have done some JITing internally to make it run fast, but the dynamic experience of running code immediately is the same.
Solution 17 - Php
Racket v 5.3.4 (MBP; time in ms):
> (time (for/sum ([x (in-range 1000000001)]) x))
cpu time: 2943 real time: 2954 gc time: 0
500000000500000000
Solution 18 - Php
I don't have enough reputation to comment on @postfuturist's Common Lisp answer, but it can be optimized to complete in ~500ms with SBCL 1.1.8 on my machine:
CL-USER> (compile nil '(lambda ()
(declare (optimize (speed 3) (space 0) (safety 0) (debug 0) (compilation-speed 0)))
(let ((sum 0))
(declare (type fixnum sum))
(loop for i from 1 to 1000000000 do (incf sum i))
sum)))
#<FUNCTION (LAMBDA ()) {1004B93CCB}>
NIL
NIL
CL-USER> (time (funcall *))
Evaluation took:
0.531 seconds of real time
0.531250 seconds of total run time (0.531250 user, 0.000000 system)
100.00% CPU
1,912,655,483 processor cycles
0 bytes consed
500000000500000000
Solution 19 - Php
Works fine in Rebol:
>> sum: 0
== 0
>> repeat i 1000000000 [sum: sum + i]
== 500000000500000000
>> type? sum
== integer!
This was using Rebol 3 which despite being 32 bit compiled it uses 64-bit integers (unlike Rebol 2 which used 32 bit integers)
Solution 20 - Php
I wanted to see what happened in CF Script
<cfscript>
ttl = 0;
for (i=0;i LTE 1000000000 ;i=i+1) {
ttl += i;
}
writeDump(ttl);
abort;
</cfscript>
I got 5.00000000067E+017
This was a pretty neat experiment. I'm fairly sure I could have coded this a bit better with more effort.
Solution 21 - Php
For the sake of completeness, in Clojure (beautiful but not very efficient):
(reduce + (take 1000000000 (iterate inc 1))) ; => 500000000500000000
Solution 22 - Php
ActivePerl v5.10.1 on 32bit windows, intel core2duo 2.6:
$sum = 0;
for ($i = 0; $i <= 1000000000 ; $i++) {
$sum += $i;
}
print $sum."\n";
result: 5.00000000067109e+017 in 5 minutes.
With "use bigint" script worked for two hours, and would worked more, but I stopped it. Too slow.
Solution 23 - Php
AWK:
BEGIN { s = 0; for (i = 1; i <= 1000000000; i++) s += i; print s }
produces the same wrong result as PHP:
500000000067108992
It seems AWK uses floating point when the numbers are really big, so at least the answer is the right order-of-magnitude.
Test runs:
$ awk 'BEGIN { s = 0; for (i = 1; i <= 100000000; i++) s += i; print s }'
5000000050000000
$ awk 'BEGIN { s = 0; for (i = 1; i <= 1000000000; i++) s += i; print s }'
500000000067108992
Solution 24 - Php
Category other interpreted language:
Tcl:
If using Tcl 8.4 or older it depends if it was compiled with 32 or 64 bit. (8.4 is end of life).
If using Tcl 8.5 or newer which has arbitrary big integers, it will display the correct result.
proc test limit {
for {set i 0} {$i < $limit} {incr i} {
incr result $i
}
return $result
}
test 1000000000
I put the test inside a proc to get it byte-compiled.
Solution 25 - Php
For the PHP code, the answer is here:
> The size of an integer is platform-dependent, although a maximum value of about two billion is the usual value (that's 32 bits signed). 64-bit platforms usually have a maximum value of about 9E18. PHP does not support unsigned integers. Integer size can be determined using the constant PHP_INT_SIZE, and maximum value using the constant PHP_INT_MAX since PHP 4.4.0 and PHP 5.0.5.
Solution 26 - Php
Harbour:
proc Main()
local sum := 0, i
for i := 0 to 1000000000
sum += i
next
? sum
return
Results in 500000000500000000
.
(on both windows/mingw/x86 and osx/clang/x64)
Solution 27 - Php
Erlang works:
> from_sum(From,Max) -> > from_sum(From,Max,Max). > from_sum(From,Max,Sum) when From =:= Max -> > Sum; > from_sum(From,Max,Sum) when From =/= Max -> > from_sum(From+1,Max,Sum+From). > > Results: 41> useless:from_sum(1,1000000000). 500000000500000000
Solution 28 - Php
Funny thing, PHP 5.5.1 gives 499999999500000000 (in ~ 30s), while Dart2Js gives 500000000067109000 (which is to be expected, since it's JS that gets executed). CLI Dart gives the right answer ... instantly.
Solution 29 - Php
Erlang gives the expected result too.
sum.erl:
-module(sum).
-export([iter_sum/2]).
iter_sum(Begin, End) -> iter_sum(Begin,End,0).
iter_sum(Current, End, Sum) when Current > End -> Sum;
iter_sum(Current, End, Sum) -> iter_sum(Current+1,End,Sum+Current).
And using it:
1> c(sum).
{ok,sum}
2> sum:iter_sum(1,1000000000).
500000000500000000
Solution 30 - Php
Smalltalk:
(1 to: 1000000000) inject: 0 into: [:subTotal :next | subTotal + next ].
"500000000500000000"
Solution 31 - Php
For completeness only.
In MATLAB there is no problem with automatic type selection:
tic; ii = 1:1000000; sum(ii); toc; ans
Elapsed time is 0.004471 seconds. ans = 5.000005000000000e+11
And in F# interactive, automatic unit types give an overflow error. Assigning type int64 gives the correct answer:
seq {int64 1.. int64 1000000} |> Seq.sum
val it : int64 = 500000500000L
Notes:
Could use Seq.reduce (+)
instead of Seq.sum
without a noticeable change in efficiency. However, using Seq.reduce (+)
with automatic unit type will give a wrong answer rather than an overflow error.
Computation time is <.5 seconds, but I am currently lazy and so I am not importing the .NET stopwatch class to get a more exact time.
Solution 32 - Php
A few answers have already explained why your PHP and Node.js code don't work as expected, so I won't repeat that here. I just want to point out that this has nothing to do with "interpreted vs compiled languages".
> Perhaps this a problem of interpreted languages, and that's why it works in a compiled language like Go?
A "language" is merely a set of well-defined rules; an implementation of a language is what's either interpreted or compiled. I could take a language whose principal implementation is compiled (like Go) and write an interpreter for it (and vice-versa), but every program processed by the interpreter should produce identical output as running the program via the compiled implementation, and this output should be in accordance with the language's specification. The PHP and Node.js results are in fact in accordance with the languages' specifications (as some other answers point out), and this has nothing to do with the fact that the principal implementations of these languages are interpreted; compiled implementations of the languages by definition must also produce the same results.
A tangible example of all this is Python, which has both widely-used compiled and interpreted implementations. Running a translated version of your program in the interpreted implementation:
>>> total = 0
>>> for i in xrange(1000000001):
... total += i
...
>>> print total
500000000500000000
must not, by the definition of Python, result in a different output than running it in the compiled implementation:
total = 0
for i in xrange(1000000001):
total += i
print total
500000000500000000
Solution 33 - Php
In ruby, these to functionally similar solutions (that return the correct answer) take significantly different amounts of time to complete:
$ time ruby -e "(1..1000000000).inject{|sum, n| sum + n}"
real 1m26.005s
user 1m26.010s
sys 0m0.076s
$ time ruby -e "1.upto(1000000000).inject(:+)"
real 0m48.957s
user 0m48.957s
sys 0m0.045s
$ ruby -v
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.8.0]
Solution 34 - Php
Javascript (and possibly PHP) represent all numbers as double, and round them for integer values. This means that they only have 53 bits of precision (instead of the 64 bits provided by int64 and a Java long), and will result in rounding errors on large values.
Solution 35 - Php
As other people have pointed out, the fastest way to do this calculation (regardless of the language) is with a simple math function (instead of a CPU intensive loop):
number = 1000000000;
result = (number/2) * (number+1);
You would still need to solve any 32/64 bit integer/float issues, depending on the language, though.
Solution 36 - Php
And the ruby one's:
[15] pry(main)> (1..1000000000).inject(0) { |sum,e| sum + e }
=> 500000000500000000
Seems to get the right number.