Java 8: performance of Streams vs Collections

JavaPerformanceCollectionsJava 8Java Stream

Java Problem Overview


I'm new to Java 8. I still don't know the API in depth, but I've made a small informal benchmark to compare the performance of the new Streams API vs the good old Collections.

The test consists in filtering a list of Integer, and for each even number, calculate the square root and storing it in a result List of Double.

Here is the code:

	public static void main(String[] args) {
		//Calculating square root of even numbers from 1 to N		
		int min = 1;
		int max = 1000000;
		
		List<Integer> sourceList = new ArrayList<>();
		for (int i = min; i < max; i++) {
			sourceList.add(i);
		}

		List<Double> result = new LinkedList<>();
		
		
		//Collections approach
		long t0 = System.nanoTime();
		long elapsed = 0;
		for (Integer i : sourceList) {
			if(i % 2 == 0){
				result.add(Math.sqrt(i));
			}
		}
		elapsed = System.nanoTime() - t0;		
		System.out.printf("Collections: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));
		
		
		//Stream approach
		Stream<Integer> stream = sourceList.stream();		
		t0 = System.nanoTime();
		result = stream.filter(i -> i%2 == 0).map(i -> Math.sqrt(i)).collect(Collectors.toList());
		elapsed = System.nanoTime() - t0;		
		System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));
		
		
		//Parallel stream approach
		stream = sourceList.stream().parallel();		
		t0 = System.nanoTime();
		result = stream.filter(i -> i%2 == 0).map(i -> Math.sqrt(i)).collect(Collectors.toList());
		elapsed = System.nanoTime() - t0;		
		System.out.printf("Parallel streams: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));		
	}.

And here are the results for a dual core machine:

	Collections: Elapsed time:	      94338247 ns 	(0,094338 seconds)
	Streams: Elapsed time:		     201112924 ns 	(0,201113 seconds)
	Parallel streams: Elapsed time:	 357243629 ns 	(0,357244 seconds)

For this particular test, streams are about twice as slow as collections, and parallelism doesn't help (or either I'm using it the wrong way?).

Questions:

  • Is this test fair? Have I made any mistake?
  • Are streams slower than collections? Has anyone made a good formal benchmark on this?
  • Which approach should I strive for?

Updated results.

I ran the test 1k times after JVM warmup (1k iterations) as advised by @pveentjer:

	Collections: Average time:	 	206884437,000000 ns 	(0,206884 seconds)
	Streams: Average time:	 	 	 98366725,000000 ns 	(0,098367 seconds)
	Parallel streams: Average time:	167703705,000000 ns 	(0,167704 seconds)

In this case streams are more performant. I wonder what would be observed in an app where the filtering function is only called once or twice during runtime.

Java Solutions


Solution 1 - Java

  1. Stop using LinkedList for anything but heavy removing from the middle of the list using iterator.

  2. Stop writing benchmarking code by hand, use JMH.

Proper benchmarks:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(StreamVsVanilla.N)
public class StreamVsVanilla {
    public static final int N = 10000;

    static List<Integer> sourceList = new ArrayList<>();
    static {
        for (int i = 0; i < N; i++) {
            sourceList.add(i);
        }
    }

    @Benchmark
    public List<Double> vanilla() {
        List<Double> result = new ArrayList<>(sourceList.size() / 2 + 1);
        for (Integer i : sourceList) {
            if (i % 2 == 0){
                result.add(Math.sqrt(i));
            }
        }
        return result;
    }

    @Benchmark
    public List<Double> stream() {
        return sourceList.stream()
                .filter(i -> i % 2 == 0)
                .map(Math::sqrt)
                .collect(Collectors.toCollection(
                    () -> new ArrayList<>(sourceList.size() / 2 + 1)));
    }
}

Result:

Benchmark                   Mode   Samples         Mean   Mean error    Units
StreamVsVanilla.stream      avgt        10       17.588        0.230    ns/op
StreamVsVanilla.vanilla     avgt        10       10.796        0.063    ns/op

Just as I expected stream implementation is fairly slower. JIT is able to inline all lambda stuff but doesn't produce as perfectly concise code as vanilla version.

Generally, Java 8 streams are not magic. They couldn't speedup already well-implemented things (with, probably, plain iterations or Java 5's for-each statements replaced with Iterable.forEach() and Collection.removeIf() calls). Streams are more about coding convenience and safety. Convenience -- speed tradeoff is working here.

Solution 2 - Java

  1. You see time less than 1 second using you benchmark. That means there can be strong influence of side effects on your results. So, I increased your task 10 times

     int max = 10_000_000;
    

and ran your benchmark. My results:

Collections: Elapsed time:	 8592999350 ns 	(8.592999 seconds)
Streams: Elapsed time:		 2068208058 ns 	(2.068208 seconds)
Parallel streams: Elapsed time:	 7186967071 ns 	(7.186967 seconds)

without edit (int max = 1_000_000) results were

Collections: Elapsed time:	 113373057 ns 	(0.113373 seconds)
Streams: Elapsed time:		 135570440 ns 	(0.135570 seconds)
Parallel streams: Elapsed time:	 104091980 ns 	(0.104092 seconds)

It's like your results: stream is slower than collection. Conclusion: much time were spent for stream initialization/values transmitting.

  1. After increasing task stream became faster (that's OK), but parallel stream remained too slow. What's wrong? Note: you have collect(Collectors.toList()) in you command. Collecting to single collection essentially introduces performance bottleneck and overhead in case of concurrent execution. It is possible to estimate the relative cost of overhead by replacing

    collecting to collection -> counting the element count

For streams it can be done by collect(Collectors.counting()). I got results:

Collections: Elapsed time:	 41856183 ns 	(0.041856 seconds)
Streams: Elapsed time:		 546590322 ns 	(0.546590 seconds)
Parallel streams: Elapsed time:	 1540051478 ns 	(1.540051 seconds)

That' s for a big task! (int max = 10000000) Conclusion: collecting items to collection took majority of time. The slowest part is adding to list. BTW, simple ArrayList is used for Collectors.toList().

Solution 3 - Java

    public static void main(String[] args) {
    //Calculating square root of even numbers from 1 to N       
    int min = 1;
    int max = 10000000;

    List<Integer> sourceList = new ArrayList<>();
    for (int i = min; i < max; i++) {
        sourceList.add(i);
    }

    List<Double> result = new LinkedList<>();


    //Collections approach
    long t0 = System.nanoTime();
    long elapsed = 0;
    for (Integer i : sourceList) {
        if(i % 2 == 0){
            result.add( doSomeCalculate(i));
        }
    }
    elapsed = System.nanoTime() - t0;       
    System.out.printf("Collections: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


    //Stream approach
    Stream<Integer> stream = sourceList.stream();       
    t0 = System.nanoTime();
    result = stream.filter(i -> i%2 == 0).map(i -> doSomeCalculate(i))
            .collect(Collectors.toList());
    elapsed = System.nanoTime() - t0;       
    System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


    //Parallel stream approach
    stream = sourceList.stream().parallel();        
    t0 = System.nanoTime();
    result = stream.filter(i -> i%2 == 0).map(i ->  doSomeCalculate(i))
            .collect(Collectors.toList());
    elapsed = System.nanoTime() - t0;       
    System.out.printf("Parallel streams: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));      
}

static double doSomeCalculate(int input) {
    for(int i=0; i<100000; i++){
        Math.sqrt(i+input);
    }
    return Math.sqrt(input);
}

I change the code a bit, ran on my mac book pro which has 8 cores, I got a reasonable result:

Collections: Elapsed time:	 	1522036826 ns 	(1.522037 seconds)
Streams: Elapsed time:		 	4315833719 ns 	(4.315834 seconds)
Parallel streams: Elapsed time:	 261152901 ns 	(0.261153 seconds)

Solution 4 - Java

For what you are trying to do, I would not use regular java api's anyway. There is a ton of boxing/unboxing going on, so there is a huge performance overhead.

Personally I think that a lot of API designed are crap because they create a lot of object litter.

Try to use a primitive arrays of double/int and try to do it single threaded and see what the performance is.

PS: You might want to have a look at JMH to take care of doing the benchmark. It takes care of some of the typical pitfalls like warming up the JVM.

Solution 5 - Java

Interesting results for Java 8 and Java 11. I used the code, provided by leventov with little modification:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(BenchmarkMain.N)
public class BenchmarkMain {

    public static final int N = 10000;

    static List<Integer> sourceList = new ArrayList<>();
    static {
        for (int i = 0; i < N; i++) {
            sourceList.add(i);
        }
    }

    @Benchmark
    public List<Double> vanilla() {
        List<Double> result = new ArrayList<>(sourceList.size() / 2 + 1);
        for (Integer i : sourceList) {
            if (i % 2 == 0){
                result.add(Math.sqrt(i));
            }
        }
        return result;
    }

    @Benchmark
    public List<Double> stream() {
        return sourceList.stream()
                .filter(i -> i % 2 == 0)
                .map(Math::sqrt)
                .collect(Collectors.toCollection(
                    () -> new ArrayList<>(sourceList.size() / 2 + 1)));
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException {
        org.openjdk.jmh.Main.main(args);

    }

}

Java 8:

# JMH version: 1.31
# VM version: JDK 1.8.0_262, OpenJDK 64-Bit Server VM, 25.262-b19
# VM invoker: /opt/jdk1.8.0_262/jre/bin/java
# VM options: <none>
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
...
Benchmark              Mode  Cnt   Score   Error  Units
BenchmarkMain.stream   avgt   25  10.680 ± 0.744  ns/op
BenchmarkMain.vanilla  avgt   25   6.490 ± 0.159  ns/op

Java 11:

# JMH version: 1.31
# VM version: JDK 11.0.2, OpenJDK 64-Bit Server VM, 11.0.2+9
# VM invoker: /opt/jdk-11.0.2/bin/java
# VM options: <none>
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
...
Benchmark              Mode  Cnt  Score   Error  Units
BenchmarkMain.stream   avgt   25  5.521 ± 0.057  ns/op
BenchmarkMain.vanilla  avgt   25  7.359 ± 0.118  ns/op

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMister SmithView Question on Stackoverflow
Solution 1 - JavaleventovView Answer on Stackoverflow
Solution 2 - JavaSergey FedorovView Answer on Stackoverflow
Solution 3 - JavaMellonView Answer on Stackoverflow
Solution 4 - JavapveentjerView Answer on Stackoverflow
Solution 5 - JavaPavel_KView Answer on Stackoverflow