What is difference between Collection.stream().forEach() and Collection.forEach()?
JavaCollectionsJava 8Java StreamJava Problem Overview
I understand that with .stream()
, I can use chain operations like .filter()
or use parallel stream. But what is difference between them if I need to execute small operations (for example, printing the elements of the list)?
collection.stream().forEach(System.out::println);
collection.forEach(System.out::println);
Java Solutions
Solution 1 - Java
For simple cases such as the one illustrated, they are mostly the same. However, there are a number of subtle differences that might be significant.
One issue is with ordering. With Stream.forEach
, the order is undefined. It's unlikely to occur with sequential streams, still, it's within the specification for Stream.forEach
to execute in some arbitrary order. This does occur frequently in parallel streams. By contrast, Iterable.forEach
is always executed in the iteration order of the Iterable
, if one is specified.
Another issue is with side effects. The action specified in Stream.forEach
is required to be non-interfering. (See the java.util.stream package doc.) Iterable.forEach
potentially has fewer restrictions. For the collections in java.util
, Iterable.forEach
will generally use that collection's Iterator
, most of which are designed to be fail-fast and which will throw ConcurrentModificationException
if the collection is structurally modified during the iteration. However, modifications that aren't structural are allowed during iteration. For example, the ArrayList class documentation says "merely setting the value of an element is not a structural modification." Thus, the action for ArrayList.forEach
is allowed to set values in the underlying ArrayList
without problems.
The concurrent collections are yet again different. Instead of fail-fast, they are designed to be weakly consistent. The full definition is at that link. Briefly, though, consider ConcurrentLinkedDeque
. The action passed to its forEach
method is allowed to modify the underlying deque, even structurally, and ConcurrentModificationException
is never thrown. However, the modification that occurs might or might not be visible in this iteration. (Hence the "weak" consistency.)
Still another difference is visible if Iterable.forEach
is iterating over a synchronized collection. On such a collection, Iterable.forEach
takes the collection's lock once and holds it across all the calls to the action method. The Stream.forEach
call uses the collection's spliterator, which does not lock, and which relies on the prevailing rule of non-interference. The collection backing the stream could be modified during iteration, and if it is, a ConcurrentModificationException
or inconsistent behavior could result.
Solution 2 - Java
This answer concerns itself with the performance of the various implementations of the loops. Its only marginally relevant for loops that are called VERY OFTEN (like millions of calls). In most cases the content of the loop will be by far the most expensive element. For situations where you loop really often, this might still be of interest.
You should repeat this tests under the target system as this is implementation specific, (full source code).
I run openjdk version 1.8.0_111 on a fast Linux machine.
I wrote a test that loops 10^6 times over a Listintegers
(10^0 -> 10^5 entries).
The results are below, the fastest method varies depending on the amount of entries in the list.
But still under worst situations, looping over 10^5 entries 10^6 times took 100 seconds for the worst performer, so other considerations are more important in virtually all situations.
public int outside = 0;
private void iteratorForEach(List<Integer> integers) {
integers.forEach((ii) -> {
outside = ii*ii;
});
}
private void forEach(List<Integer> integers) {
for(Integer next : integers) {
outside = next * next;
}
}
private void forCounter(List<Integer> integers) {
for(int ii = 0; ii < integers.size(); ii++) {
Integer next = integers.get(ii);
outside = next*next;
}
}
private void iteratorStream(List<Integer> integers) {
integers.stream().forEach((ii) -> {
outside = ii*ii;
});
}
Here are my timings: milliseconds / function / number of entries in list. Each run is 10^6 loops.
1 10 100 1000 10000
iterator.forEach 27 116 959 8832 88958
for:each 53 171 1262 11164 111005
for with index 39 112 920 8577 89212
iterable.stream.forEach 255 324 1030 8519 88419
If you repeat the experiment, I posted the full source code. Please do edit this answer and add you results with a notation of the tested system.
Using a MacBook Pro, 2.5 GHz Intel Core i7, 16 GB, macOS 10.12.6:
1 10 100 1000 10000
iterator.forEach 27 106 1047 8516 88044
for:each 46 143 1182 10548 101925
for with index 49 145 887 7614 81130
iterable.stream.forEach 393 397 1108 8908 88361
Java 8 Hotspot VM - 3.4GHz Intel Xeon, 8 GB, Windows 10 Pro
1 10 100 1000 10000
iterator.forEach 30 115 928 8384 85911
for:each 40 125 1166 10804 108006
for with index 30 120 956 8247 81116
iterable.stream.forEach 260 237 1020 8401 84883
Java 11 Hotspot VM - 3.4GHz Intel Xeon, 8 GB, Windows 10 Pro
(same machine as above, different JDK version)
1 10 100 1000 10000
iterator.forEach 20 104 940 8350 88918
for:each 50 140 991 8497 89873
for with index 37 140 945 8646 90402
iterable.stream.forEach 200 270 1054 8558 87449
Java 11 OpenJ9 VM - 3.4GHz Intel Xeon, 8 GB, Windows 10 Pro
(same machine and JDK version as above, different VM)
1 10 100 1000 10000
iterator.forEach 211 475 3499 33631 336108
for:each 200 375 2793 27249 272590
for with index 384 467 2718 26036 261408
iterable.stream.forEach 515 714 3096 26320 262786
Java 8 Hotspot VM - 2.8GHz AMD, 64 GB, Windows Server 2016
1 10 100 1000 10000
iterator.forEach 95 192 2076 19269 198519
for:each 157 224 2492 25466 248494
for with index 140 368 2084 22294 207092
iterable.stream.forEach 946 687 2206 21697 238457
Java 11 Hotspot VM - 2.8GHz AMD, 64 GB, Windows Server 2016
(same machine as above, different JDK version)
1 10 100 1000 10000
iterator.forEach 72 269 1972 23157 229445
for:each 192 376 2114 24389 233544
for with index 165 424 2123 20853 220356
iterable.stream.forEach 921 660 2194 23840 204817
Java 11 OpenJ9 VM - 2.8GHz AMD, 64 GB, Windows Server 2016
(same machine and JDK version as above, different VM)
1 10 100 1000 10000
iterator.forEach 592 914 7232 59062 529497
for:each 477 1576 14706 129724 1190001
for with index 893 838 7265 74045 842927
iterable.stream.forEach 1359 1782 11869 104427 958584
The VM implementation you choose also makes a difference Hotspot/OpenJ9/etc.
Solution 3 - Java
There is no difference between the two you have mentioned, atleast conceptually, the Collection.forEach()
is just a shorthand.
Internally the stream()
version has somewhat more overhead due to object creation, but looking at the running time it neither has an overhead there.
Both implementations end up iterating over the collection
contents once, and during the iteration print out the element.
Solution 4 - Java
Collection.forEach() uses the collection's iterator (if one is specified). That means that the processing order of the items is defined. In contrast, the processing order of Collection.stream().forEach() is undefined.
In most cases, it doesn't make a difference which of the two we choose. Parallel streams allow us to execute the stream in multiple threads, and in such situations, the execution order is undefined. Java only requires all threads to finish before any terminal operation, such as Collectors.toList(), is called. Let's look at an example where we first call forEach() directly on the collection, and second, on a parallel stream:
list.forEach(System.out::print);
System.out.print(" ");
list.parallelStream().forEach(System.out::print);
If we run the code several times, we see that list.forEach() processes the items in insertion order, while list.parallelStream().forEach() produces a different result at each run. One possible output is:
ABCD CDBA
Another one is:
ABCD DBCA