How can I quickly sum all numbers in a file?

LinuxPerlBashShellAwk

Linux Problem Overview


I have a file which contains several thousand numbers, each on it's own line:

34
42
11
6
2
99
...

I'm looking to write a script which will print the sum of all numbers in the file. I've got a solution, but it's not very efficient. (It takes several minutes to run.) I'm looking for a more efficient solution. Any suggestions?

Linux Solutions


Solution 1 - Linux

You can use awk:

awk '{ sum += $1 } END { print sum }' file

Solution 2 - Linux

None of the solution thus far use paste. Here's one:

paste -sd+ filename | bc

As an example, calculate Σn where 1<=n<=100000:

$ seq 100000 | paste -sd+ | bc -l
5000050000

(For the curious, seq n would print a sequence of numbers from 1 to n given a positive number n.)

Solution 3 - Linux

For a Perl one-liner, it's basically the same thing as the awk solution in Ayman Hourieh's answer:

 % perl -nle '$sum += $_ } END { print $sum'

If you're curious what Perl one-liners do, you can deparse them:

 %  perl -MO=Deparse -nle '$sum += $_ } END { print $sum'

The result is a more verbose version of the program, in a form that no one would ever write on their own:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
	chomp $_;
	$sum += $_;
}
sub END {
	print $sum;
}
-e syntax OK

Just for giggles, I tried this with a file containing 1,000,000 numbers (in the range 0 - 9,999). On my Mac Pro, it returns virtually instantaneously. That's too bad, because I was hoping using mmap would be really fast, but it's just the same time:

use 5.010;
use File::Map qw(map_file);

map_file my $map, $ARGV[0];

$sum += $1 while $map =~ m/(\d+)/g;

say $sum;

Solution 4 - Linux

Just for fun, let's benchmark it:

$ for ((i=0; i<1000000; i++)) ; do echo $RANDOM; done > random_numbers

$ time perl -nle '$sum += $_ } END { print $sum' random_numbers
16379866392

real	0m0.226s
user	0m0.219s
sys 	0m0.002s

$ time awk '{ sum += $1 } END { print sum }' random_numbers
16379866392

real	0m0.311s
user	0m0.304s
sys 	0m0.005s

$ time { { tr "\n" + < random_numbers ; echo 0; } | bc; }
16379866392

real	0m0.445s
user	0m0.438s
sys 	0m0.024s

$ time { s=0;while read l; do s=$((s+$l));done<random_numbers;echo $s; }
16379866392

real	0m9.309s
user	0m8.404s
sys 	0m0.887s

$ time { s=0;while read l; do ((s+=l));done<random_numbers;echo $s; }
16379866392

real	0m7.191s
user	0m6.402s
sys 	0m0.776s

$ time { sed ':a;N;s/\n/+/;ta' random_numbers|bc; }
^C

real	4m53.413s
user	4m52.584s
sys	0m0.052s

I aborted the sed run after 5 minutes


I've been diving to [tag:lua], and it is speedy:

$ time lua -e 'sum=0; for line in io.lines() do sum=sum+line end; print(sum)' < random_numbers
16388542582.0

real    0m0.362s
user    0m0.313s
sys     0m0.063s

and while I'm updating this, ruby:

$ time ruby -e 'sum = 0; File.foreach(ARGV.shift) {|line| sum+=line.to_i}; puts sum' random_numbers
16388542582

real    0m0.378s
user    0m0.297s
sys     0m0.078s

Heed Ed Morton's advice: using $1

$ time awk '{ sum += $1 } END { print sum }' random_numbers
16388542582

real    0m0.421s
user    0m0.359s
sys     0m0.063s

vs using $0

$ time awk '{ sum += $0 } END { print sum }' random_numbers
16388542582

real    0m0.302s
user    0m0.234s
sys     0m0.063s

Solution 5 - Linux

Another option is to use jq:

$ seq 10|jq -s add
55

-s (--slurp) reads the input lines into an array.

Solution 6 - Linux

This is straight Bash:

sum=0
while read -r line
do
    (( sum += line ))
done < file
echo $sum

Solution 7 - Linux

I prefer to use R for this:

$ R -e 'sum(scan("filename"))'

Solution 8 - Linux

Here's another one-liner

( echo 0 ; sed 's/$/ +/' foo ; echo p ) | dc

This assumes the numbers are integers. If you need decimals, try

( echo 0 2k ; sed 's/$/ +/' foo ; echo p ) | dc

Adjust 2 to the number of decimals needed.

Solution 9 - Linux

Perl 6

say sum lines

~$ perl6 -e '.say for 0..1000000' > test.in

~$ perl6 -e 'say sum lines' < test.in
500000500000

Solution 10 - Linux

$ perl -MList::Util=sum -le 'print sum <>' nums.txt

Solution 11 - Linux

I prefer to use GNU datamash for such tasks because it's more succinct and legible than perl or awk. For example

datamash sum 1 < myfile

where 1 denotes the first column of data.

Solution 12 - Linux

More succinct:

# Ruby
ruby -e 'puts open("random_numbers").map(&:to_i).reduce(:+)'

# Python
python -c 'print(sum(int(l) for l in open("random_numbers")))'

Solution 13 - Linux

I couldn't just pass by... Here's my Haskell one-liner. It's actually quite readable:

sum <$> (read <$>) <$> lines <$> getContents

Unfortunately there's no ghci -e to just run it, so it needs the main function, print and compilation.

main = (sum <$> (read <$>) <$> lines <$> getContents) >>= print

To clarify, we read entire input (getContents), split it by lines, read as numbers and sum. <$> is fmap operator - we use it instead of usual function application because sure this all happens in IO. read needs an additional fmap, because it is also in the list.

$ ghc sum.hs
[1 of 1] Compiling Main             ( sum.hs, sum.o )
Linking sum ...
$ ./sum 
1
2
4
^D
7

Here's a strange upgrade to make it work with floats:

main = ((0.0 + ) <$> sum <$> (read <$>) <$> lines <$> getContents) >>= print
$ ./sum 
1.3
2.1
4.2
^D
7.6000000000000005

Solution 14 - Linux

cat nums | perl -ne '$sum += $_ } { print $sum'

(same as brian d foy's answer, without 'END')

Solution 15 - Linux

Just for fun, lets do it with PDL, Perl's array math engine!

perl -MPDL -E 'say rcols(shift)->sum' datafile

rcols reads columns into a matrix (1D in this case) and sum (surprise) sums all the element of the matrix.

Solution 16 - Linux

Here is a solution using python with a generator expression. Tested with a million numbers on my old cruddy laptop.

time python -c "import sys; print sum((float(l) for l in sys.stdin))" < file

real    0m0.619s
user    0m0.512s
sys     0m0.028s

Solution 17 - Linux

C++ "one-liner":

#include <iostream>
#include <iterator>
#include <numeric>
using namespace std;

int main() {
    cout << accumulate(istream_iterator<int>(cin), istream_iterator<int>(), 0) << endl;
}

Solution 18 - Linux

sed ':a;N;s/\n/+/;ta' file|bc

Solution 19 - Linux

Running R scripts

I've written an R script to take arguments of a file name and sum the lines.

#! /usr/local/bin/R
file=commandArgs(trailingOnly=TRUE)[1]
sum(as.numeric(readLines(file)))

This can be sped up with the "data.table" or "vroom" package as follows:

#! /usr/local/bin/R
file=commandArgs(trailingOnly=TRUE)[1]
sum(data.table::fread(file))
#! /usr/local/bin/R
file=commandArgs(trailingOnly=TRUE)[1]
sum(vroom::vroom(file))

Benchmarking

Same benchmarking data as @glenn jackman.

for ((i=0; i<1000000; i++)) ; do echo $RANDOM; done > random_numbers

In comparison to the R call above, running R 3.5.0 as a script is comparable to other methods (on the same Linux Debian server).

$ time R -e 'sum(scan("random_numbers"))'  
 0.37s user
 0.04s system
 86% cpu
 0.478 total

R script with readLines

$ time Rscript sum.R random_numbers
  0.53s user
  0.04s system
  84% cpu
  0.679 total

R script with data.table

$ time Rscript sum.R random_numbers     
 0.30s user
 0.05s system
 77% cpu
 0.453 total

R script with vroom

$ time Rscript sum.R random_numbers     
  0.54s user 
  0.11s system
  93% cpu
  0.696 total

Comparison with other languages

For reference here as some other methods suggested on the same hardware

Python 2 (2.7.13)

$ time python2 -c "import sys; print sum((float(l) for l in sys.stdin))" < random_numbers 
 0.27s user 0.00s system 89% cpu 0.298 total

Python 3 (3.6.8)

$ time python3 -c "import sys; print(sum((float(l) for l in sys.stdin)))" < random_number
0.37s user 0.02s system 98% cpu 0.393 total

Ruby (2.3.3)

$  time ruby -e 'sum = 0; File.foreach(ARGV.shift) {|line| sum+=line.to_i}; puts sum' random_numbers
 0.42s user
 0.03s system
 72% cpu
 0.625 total

Perl (5.24.1)

$ time perl -nle '$sum += $_ } END { print $sum' random_numbers
 0.24s user
 0.01s system
 99% cpu
 0.249 total

Awk (4.1.4)

$ time awk '{ sum += $0 } END { print sum }' random_numbers
 0.26s user
 0.01s system
 99% cpu
 0.265 total
$ time awk '{ sum += $1 } END { print sum }' random_numbers
 0.34s user
 0.01s system
 99% cpu
 0.354 total

C (clang version 3.3; gcc (Debian 6.3.0-18) 6.3.0 )

 $ gcc sum.c -o sum && time ./sum < random_numbers   
 0.10s user
 0.00s system
 96% cpu
 0.108 total

Update with additional languages

Lua (5.3.5)

$ time lua -e 'sum=0; for line in io.lines() do sum=sum+line end; print(sum)' < random_numbers 
 0.30s user 
 0.01s system
 98% cpu
 0.312 total

tr (8.26) must be timed in bash, not compatible with zsh

$time { { tr "\n" + < random_numbers ; echo 0; } | bc; }
real	0m0.494s
user	0m0.488s
sys	0m0.044s

sed (4.4) must be timed in bash, not compatible with zsh

$  time { head -n 10000 random_numbers | sed ':a;N;s/\n/+/;ta' |bc; }
real	0m0.631s
user	0m0.628s
sys	    0m0.008s
$  time { head -n 100000 random_numbers | sed ':a;N;s/\n/+/;ta' |bc; }
real	1m2.593s
user	1m2.588s
sys 	0m0.012s

note: sed calls seem to work faster on systems with more memory available (note smaller datasets used for benchmarking sed)

Julia (0.5.0)

$ time julia -e 'print(sum(readdlm("random_numbers")))'
 3.00s user 
 1.39s system 
 136% cpu 
 3.204 total
$  time julia -e 'print(sum(readtable("random_numbers")))'
 0.63s user 
 0.96s system 
 248% cpu 
 0.638 total

Notice that as in R, file I/O methods have different performance.

Solution 20 - Linux

Another for fun

sum=0;for i in $(cat file);do sum=$((sum+$i));done;echo $sum

or another bash only

s=0;while read l; do s=$((s+$l));done<file;echo $s

But awk solution is probably best as it's most compact.

Solution 21 - Linux

C always wins for speed:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    ssize_t read;
    char *line = NULL;
    size_t len = 0;
    double sum = 0.0;

    while (read = getline(&line, &len, stdin) != -1) {
        sum += atof(line);
    }

    printf("%f", sum);
    return 0;
}

Timing for 1M numbers (same machine/input as my python answer):

$ gcc sum.c -o sum && time ./sum < numbers 
5003371677.000000
real    0m0.188s
user    0m0.180s
sys     0m0.000s

Solution 22 - Linux

With Ruby:

ruby -e "File.read('file.txt').split.inject(0){|mem, obj| mem += obj.to_f}"

Solution 23 - Linux

In Go:

package main

import (
    "bufio"
    "fmt"
    "os"
    "strconv"
)

func main() {
    scanner := bufio.NewScanner(os.Stdin)
    sum := int64(0)
    for scanner.Scan() {
        v, err := strconv.ParseInt(scanner.Text(), 10, 64)
        if err != nil {
            fmt.Fprintf(os.Stderr, "Not an integer: '%s'\n", scanner.Text())
            os.Exit(1)
        }
        sum += v
    }
    fmt.Println(sum)
}

Solution 24 - Linux

Bash variant

raw=$(cat file)
echo $(( ${raw//$'\n'/+} ))

$ wc -l file
10000 file

$ time ./test
323390

real	0m3,096s
user	0m3,095s
sys	    0m0,000s

What is happening here? Read the content of a file into $raw var. Then create math statement from this var by changing all new lines into '+'

Solution 25 - Linux

I don't know if you can get a lot better than this, considering you need to read through the whole file.

$sum = 0;
while(<>){
   $sum += $_;
}
print $sum;

Solution 26 - Linux

Here's another:

open(FIL, "a.txt");

my $sum = 0;
foreach( <FIL> ) {chomp; $sum += $_;}

close(FIL);

print "Sum = $sum\n";

Solution 27 - Linux

You can do it with Alacon - command-line utility for Alasql database.

It works with Node.js, so you need to install Node.js and then Alasql package:

To calculate sum from TXT file you can use the following command:

> node alacon "SELECT VALUE SUM([0]) FROM TXT('mydata.txt')"

Solution 28 - Linux

It is not easier to replace all new lines by +, add a 0 and send it to the Ruby interpreter?

(sed -e "s/$/+/" file; echo 0)|irb

If you do not have irb, you can send it to bc, but you have to remove all newlines except the last one (of echo). It is better to use tr for this, unless you have a PhD in sed .

(sed -e "s/$/+/" file|tr -d "\n"; echo 0)|bc

Solution 29 - Linux

In shell using awk, I have used below script to do so:

    #!/bin/bash


total=0;

for i in $( awk '{ print $1; }' <myfile> )
do
 total=$(echo $total+$i | bc )
 ((count++))
done
echo "scale=2; $total " | bc

Solution 30 - Linux

One in tcl:

#!/usr/bin/env tclsh
set sum 0
while {[gets stdin num] >= 0} { incr sum $num }
puts $sum

Solution 31 - Linux

GNU Parallel can presumably be used to improve many of the above answers by spreading the workload across multiple cores.

In the example below we send chunks of 500 numbers (--max-lines=500) to bc processes which are executed in parallel 4 at a time (-j 4). The results are then aggregated by a final bc.

time parallel --max-lines=500 -j 4 --pipe "paste -sd+ - | bc" < random_numbers | paste -sd+ - | bc

The optimal choice of work size and number of parallel processes depends on the machine and problem. Note that this solution only really shines when there's a large number of parallel processes with substantial work each.

Solution 32 - Linux

I have not tested this but it should work:

cat f | tr "\n" "+" | sed 's/+$/\n/' | bc

You might have to add "\n" to the string before bc (like via echo) if bc doesn't treat EOF and EOL...

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMark RobertsView Question on Stackoverflow
Solution 1 - LinuxAyman HouriehView Answer on Stackoverflow
Solution 2 - LinuxdevnullView Answer on Stackoverflow
Solution 3 - Linuxbrian d foyView Answer on Stackoverflow
Solution 4 - Linuxglenn jackmanView Answer on Stackoverflow
Solution 5 - LinuxnisetamaView Answer on Stackoverflow
Solution 6 - LinuxDennis WilliamsonView Answer on Stackoverflow
Solution 7 - LinuxfedornView Answer on Stackoverflow
Solution 8 - LinuxlhfView Answer on Stackoverflow
Solution 9 - LinuxBrad GilbertView Answer on Stackoverflow
Solution 10 - LinuxZaidView Answer on Stackoverflow
Solution 11 - LinuxhertzsprungView Answer on Stackoverflow
Solution 12 - LinuxVidulView Answer on Stackoverflow
Solution 13 - LinuxPeter KView Answer on Stackoverflow
Solution 14 - LinuxedibleEnergyView Answer on Stackoverflow
Solution 15 - LinuxJoel BergerView Answer on Stackoverflow
Solution 16 - LinuxdwurfView Answer on Stackoverflow
Solution 17 - LinuxPeter KView Answer on Stackoverflow
Solution 18 - Linuxghostdog74View Answer on Stackoverflow
Solution 19 - LinuxTom KellyView Answer on Stackoverflow
Solution 20 - LinuxnickjbView Answer on Stackoverflow
Solution 21 - LinuxdwurfView Answer on Stackoverflow
Solution 22 - LinuxsitesView Answer on Stackoverflow
Solution 23 - LinuxdwurfView Answer on Stackoverflow
Solution 24 - LinuxIvanView Answer on Stackoverflow
Solution 25 - LinuxStefan KendallView Answer on Stackoverflow
Solution 26 - Linuxruben2020View Answer on Stackoverflow
Solution 27 - LinuxagershunView Answer on Stackoverflow
Solution 28 - LinuxDaniel PorumbelView Answer on Stackoverflow
Solution 29 - LinuxShiwanginiView Answer on Stackoverflow
Solution 30 - LinuxShawnView Answer on Stackoverflow
Solution 31 - Linuxuser12719View Answer on Stackoverflow
Solution 32 - LinuxDVKView Answer on Stackoverflow