Why are Java Streams once-off?

JavaJava 8Java StreamApi Design

Java Problem Overview


Unlike C#'s IEnumerable, where an execution pipeline can be executed as many times as we want, in Java a stream can be 'iterated' only once.

Any call to a terminal operation closes the stream, rendering it unusable. This 'feature' takes away a lot of power.

I imagine the reason for this is not technical. What were the design considerations behind this strange restriction?

Edit: in order to demonstrate what I am talking about, consider the following implementation of Quick-Sort in C#:

IEnumerable<int> QuickSort(IEnumerable<int> ints)
{
  if (!ints.Any()) {
    return Enumerable.Empty<int>();
  }

  int pivot = ints.First();

  IEnumerable<int> lt = ints.Where(i => i < pivot);
  IEnumerable<int> gt = ints.Where(i => i > pivot);

  return QuickSort(lt).Concat(new int[] { pivot }).Concat(QuickSort(gt));
}

Now to be sure, I am not advocating that this is a good implementation of quick sort! It is however great example of the expressive power of lambda expression combined with stream operation.

And it can't be done in Java! I can't even ask a stream whether it is empty without rendering it unusable.

Java Solutions


Solution 1 - Java

I have some recollections from the early design of the Streams API that might shed some light on the design rationale.

Back in 2012, we were adding lambdas to the language, and we wanted a collections-oriented or "bulk data" set of operations, programmed using lambdas, that would facilitate parallelism. The idea of lazily chaining operations together was well established by this point. We also didn't want the intermediate operations to store results.

The main issues we needed to decide were what the objects in the chain looked like in the API and how they hooked up to data sources. The sources were often collections, but we also wanted to support data coming from a file or the network, or data generated on-the-fly, e.g., from a random number generator.

There were many influences of existing work on the design. Among the more influential were Google's Guava library and the Scala collections library. (If anybody is surprised about the influence from Guava, note that Kevin Bourrillion, Guava lead developer, was on the JSR-335 Lambda expert group.) On Scala collections, we found this talk by Martin Odersky to be of particular interest: Future-Proofing Scala Collections: from Mutable to Persistent to Parallel. (Stanford EE380, 2011 June 1.)

Our prototype design at the time was based around Iterable. The familiar operations filter, map, and so forth were extension (default) methods on Iterable. Calling one added an operation to the chain and returned another Iterable. A terminal operation like count would call iterator() up the chain to the source, and the operations were implemented within each stage's Iterator.

Since these are Iterables, you can call the iterator() method more than once. What should happen then?

If the source is a collection, this mostly works fine. Collections are Iterable, and each call to iterator() produces a distinct Iterator instance that is independent of any other active instances, and each traverses the collection independently. Great.

Now what if the source is one-shot, like reading lines from a file? Maybe the first Iterator should get all the values but the second and subsequent ones should be empty. Maybe the values should be interleaved among the Iterators. Or maybe each Iterator should get all the same values. Then, what if you have two iterators and one gets farther ahead of the other? Somebody will have to buffer up the values in the second Iterator until they're read. Worse, what if you get one Iterator and read all the values, and only then get a second Iterator. Where do the values come from now? Is there a requirement for them all to be buffered up just in case somebody wants a second Iterator?

Clearly, allowing multiple Iterators over a one-shot source raises a lot of questions. We didn't have good answers for them. We wanted consistent, predictable behavior for what happens if you call iterator() twice. This pushed us toward disallowing multiple traversals, making the pipelines one-shot.

We also observed others bumping into these issues. In the JDK, most Iterables are collections or collection-like objects, which allow multiple traversal. It isn't specified anywhere, but there seemed to be an unwritten expectation that Iterables allow multiple traversal. A notable exception is the NIO DirectoryStream interface. Its specification includes this interesting warning:

> While DirectoryStream extends Iterable, it is not a general-purpose Iterable as it supports only a single Iterator; invoking the iterator method to obtain a second or subsequent iterator throws IllegalStateException.

[bold in original]

This seemed unusual and unpleasant enough that we didn't want to create a whole bunch of new Iterables that might be once-only. This pushed us away from using Iterable.

About this time, an article by Bruce Eckel appeared that described a spot of trouble he'd had with Scala. He'd written this code:

// Scala
val lines = fromString(data).getLines
val registrants = lines.map(Registrant)
registrants.foreach(println)
registrants.foreach(println)

It's pretty straightforward. It parses lines of text into Registrant objects and prints them out twice. Except that it actually only prints them out once. It turns out that he thought that registrants was a collection, when in fact it's an iterator. The second call to foreach encounters an empty iterator, from which all values have been exhausted, so it prints nothing.

This kind of experience convinced us that it was very important to have clearly predictable results if multiple traversal is attempted. It also highlighted the importance of distinguishing between lazy pipeline-like structures from actual collections that store data. This in turn drove the separation of the lazy pipeline operations into the new Stream interface and keeping only eager, mutative operations directly on Collections. Brian Goetz has explained the rationale for that.

What about allowing multiple traversal for collection-based pipelines but disallowing it for non-collection-based pipelines? It's inconsistent, but it's sensible. If you're reading values from the network, of course you can't traverse them again. If you want to traverse them multiple times, you have to pull them into a collection explicitly.

But let's explore allowing multiple traversal from collections-based pipelines. Let's say you did this:

Iterable<?> it = source.filter(...).map(...).filter(...).map(...);
it.into(dest1);
it.into(dest2);

(The into operation is now spelled collect(toList()).)

If source is a collection, then the first into() call will create a chain of Iterators back to the source, execute the pipeline operations, and send the results into the destination. The second call to into() will create another chain of Iterators, and execute the pipeline operations again. This isn't obviously wrong but it does have the effect of performing all the filter and map operations a second time for each element. I think many programmers would have been surprised by this behavior.

As I mentioned above, we had been talking to the Guava developers. One of the cool things they have is an Idea Graveyard where they describe features that they decided not to implement along with the reasons. The idea of lazy collections sounds pretty cool, but here's what they have to say about it. Consider a List.filter() operation that returns a List:

> The biggest concern here is that too many operations become expensive, linear-time propositions. If you want to filter a list and get a list back, and not just a Collection or an Iterable, you can use ImmutableList.copyOf(Iterables.filter(list, predicate)), which "states up front" what it's doing and how expensive it is.

To take a specific example, what's the cost of get(0) or size() on a List? For commonly used classes like ArrayList, they're O(1). But if you call one of these on a lazily-filtered list, it has to run the filter over the backing list, and all of a sudden these operations are O(n). Worse, it has to traverse the backing list on every operation.

This seemed to us to be too much laziness. It's one thing to set up some operations and defer actual execution until you so "Go". It's another to set things up in such a way that hides a potentially large amount of recomputation.

In proposing to disallow non-linear or "no-reuse" streams, Paul Sandoz described the potential consequences of allowing them as giving rise to "unexpected or confusing results." He also mentioned that parallel execution would make things even trickier. Finally, I'd add that a pipeline operation with side effects would lead to difficult and obscure bugs if the operation were unexpectedly executed multiple times, or at least a different number of times than the programmer expected. (But Java programmers don't write lambda expressions with side effects, do they? DO THEY??)

So that's the basic rationale for the Java 8 Streams API design that allows one-shot traversal and that requires a strictly linear (no branching) pipeline. It provides consistent behavior across multiple different stream sources, it clearly separates lazy from eager operations, and it provides a straightforward execution model.


With regard to IEnumerable, I am far from an expert on C# and .NET, so I would appreciate being corrected (gently) if I draw any incorrect conclusions. It does appear, however, that IEnumerable permits multiple traversal to behave differently with different sources; and it permits a branching structure of nested IEnumerable operations, which may result in some significant recomputation. While I appreciate that different systems make different tradeoffs, these are two characteristics that we sought to avoid in the design of the Java 8 Streams API.

The quicksort example given by the OP is interesting, puzzling, and I'm sorry to say, somewhat horrifying. Calling QuickSort takes an IEnumerable and returns an IEnumerable, so no sorting is actually done until the final IEnumerable is traversed. What the call seems to do, though, is build up a tree structure of IEnumerables that reflects the partitioning that quicksort would do, without actually doing it. (This is lazy computation, after all.) If the source has N elements, the tree will be N elements wide at its widest, and it will be lg(N) levels deep.

It seems to me -- and once again, I'm not a C# or .NET expert -- that this will cause certain innocuous-looking calls, such as pivot selection via ints.First(), to be more expensive than they look. At the first level, of course, it's O(1). But consider a partition deep in the tree, at the right-hand edge. To compute the first element of this partition, the entire source has to be traversed, an O(N) operation. But since the partitions above are lazy, they must be recomputed, requiring O(lg N) comparisons. So selecting the pivot would be an O(N lg N) operation, which is as expensive as an entire sort.

But we don't actually sort until we traverse the returned IEnumerable. In the standard quicksort algorithm, each level of partitioning doubles the number of partitions. Each partition is only half the size, so each level remains at O(N) complexity. The tree of partitions is O(lg N) high, so the total work is O(N lg N).

With the tree of lazy IEnumerables, at the bottom of the tree there are N partitions. Computing each partition requires a traversal of N elements, each of which requires lg(N) comparisons up the tree. To compute all the partitions at the bottom of the tree, then, requires O(N^2 lg N) comparisons.

(Is this right? I can hardly believe this. Somebody please check this for me.)

In any case, it is indeed cool that IEnumerable can be used this way to build up complicated structures of computation. But if it does increase the computational complexity as much as I think it does, it would seem that programming this way is something that should be avoided unless one is extremely careful.

Solution 2 - Java

Background

While the question appears simple, the actual answer requires some background to make sense. If you want to skip to the conclusion, scroll down...

Pick your comparison point - Basic functionality

Using basic concepts, C#'s IEnumerable concept is more closely related to Java's Iterable, which is able to create as many Iterators as you want. IEnumerables create IEnumerators. Java's Iterable create Iterators

The history of each concept is similar, in that both IEnumerable and Iterable have a basic motivation to allow 'for-each' style looping over the members of data collections. That's an oversimplification as they both allow more than just that, and they also arrived at that stage via different progressions, but it is a significant common feature regardless.

Let's compare that feature: in both languages, if a class implements the IEnumerable/Iterable, then that class must implement at least a single method (for C#, it's GetEnumerator and for Java it's iterator()). In each case, the instance returned from that (IEnumerator/Iterator) allows you to access the current and subsequent members of the data. This feature is used in the for-each language syntax.

Pick your comparison point - Enhanced functionality

IEnumerable in C# has been extended to allow a number of other language features (mostly related to Linq). Features added include selections, projections, aggregations, etc. These extensions have a strong motivation from use in set-theory, similar to SQL and Relational Database concepts.

Java 8 has also had functionality added to enable a degree of functional programming using Streams and Lambdas. Note that Java 8 streams are not primarily motivated by set theory, but by functional programming. Regardless, there are a lot of parallels.

So, this is the second point. The enhancements made to C# were implemented as an enhancement to the IEnumerable concept. In Java, though, the enhancements made were implemented by creating new base concepts of Lambdas and Streams, and then also creating a relatively trivial way to convert from Iterators and Iterables to Streams, and visa-versa.

So, comparing IEnumerable to Java's Stream concept is incomplete. You need to compare it to the combined Streams and Collections API's in Java.

In Java, Streams are not the same as Iterables, or Iterators

Streams are not designed to solve problems the same way that iterators are:

  • Iterators are a way of describing the sequence of data.
  • Streams are a way of describing a sequence of data transformations.

With an Iterator, you get a data value, process it, and then get another data value.

With Streams, you chain a sequence of functions together, then you feed an input value to the stream, and get the output value from the combined sequence. Note, in Java terms, each function is encapsulated in a single Stream instance. The Streams API allows you to link a sequence of Stream instances in a way that chains a sequence of transformation expressions.

In order to complete the Stream concept, you need a source of data to feed the stream, and a terminal function that consumes the stream.

The way you feed values in to the stream may in fact be from an Iterable, but the Stream sequence itself is not an Iterable, it is a compound function.

A Stream is also intended to be lazy, in the sense that it only does work when you request a value from it.

Note these significant assumptions and features of Streams:

  • A Stream in Java is a transformation engine, it transforms a data item in one state, to being in another state.
  • streams have no concept of the data order or position, the simply transform whatever they are asked to.
  • streams can be supplied with data from many sources, including other streams, Iterators, Iterables, Collections,
  • you cannot "reset" a stream, that would be like "reprogramming the transformation". Resetting the data source is probably what you want.
  • there is logically only 1 data item 'in flight' in the stream at any time (unless the stream is a parallel stream, at which point, there is 1 item per thread). This is independent of the data source which may have more than the current items 'ready' to be supplied to the stream, or the stream collector which may need to aggregate and reduce multiple values.
  • Streams can be unbound (infinite), limited only by the data source, or collector (which can be infinite too).
  • Streams are 'chainable', the output of filtering one stream, is another stream. Values input to and transformed by a stream can in turn be supplied to another stream which does a different transformation. The data, in its transformed state flows from one stream to the next. You do not need to intervene and pull the data from one stream and plug it in to the next.

C# Comparison

When you consider that a Java Stream is just a part of a supply, stream, and collect system, and that Streams and Iterators are often used together with Collections, then it is no wonder that it is hard to relate to the same concepts which are almost all embedded in to a single IEnumerable concept in C#.

Parts of IEnumerable (and close related concepts) are apparent in all of the Java Iterator, Iterable, Lambda, and Stream concepts.

There are small things that the Java concepts can do that are harder in IEnumerable, and visa-versa.


Conclusion

  • There's no design problem here, just a problem in matching concepts between the languages.
  • Streams solve problems in a different way
  • Streams add functionality to Java (they add a different way of doing things, they do not take functionality away)

Adding Streams gives you more choices when solving problems, which is fair to classify as 'enhancing power', not 'reducing', 'taking away', or 'restricting' it.

Why are Java Streams once-off?

This question is misguided, because streams are function sequences, not data. Depending on the data source that feeds the stream, you can reset the data source, and feed the same, or different stream.

Unlike C#'s IEnumerable, where an execution pipeline can be executed as many times as we want, in Java a stream can be 'iterated' only once.

Comparing an IEnumerable to a Stream is misguided. The context you are using to say IEnumerable can be executed as many times as you want, is best compared to Java Iterables, which can be iterated as many times as you want. A Java Stream represents a subset of the IEnumerable concept, and not the subset that supplies data, and thus cannot be 'rerun'.

Any call to a terminal operation closes the stream, rendering it unusable. This 'feature' takes away a lot of power.

The first statement is true, in a sense. The 'takes away power' statement is not. You are still comparing Streams it IEnumerables. The terminal operation in the stream is like a 'break' clause in a for loop. You are always free to have another stream, if you want, and if you can re-supply the data you need. Again, if you consider the IEnumerable to be more like an Iterable, for this statement, Java does it just fine.

I imagine the reason for this is not technical. What were the design considerations behind this strange restriction?

The reason is technical, and for the simple reason that a Stream a subset of what think it is. The stream subset does not control the data supply, so you should reset the supply, not the stream. In that context, it is not so strange.

QuickSort example

Your quicksort example has the signature:

IEnumerable<int> QuickSort(IEnumerable<int> ints)

You are treating the input IEnumerable as a data source:

IEnumerable<int> lt = ints.Where(i => i < pivot);

Additionally, return value is IEnumerable too, which is a supply of data, and since this is a Sort operation, the order of that supply is significant. If you consider the Java Iterable class to be the appropriate match for this, specifically the List specialization of Iterable, since List is a supply of data which has a guaranteed order or iteration, then the equivalent Java code to your code would be:

Stream<Integer> quickSort(List<Integer> ints) {
    // Using a stream to access the data, instead of the simpler ints.isEmpty()
    if (!ints.stream().findAny().isPresent()) {
        return Stream.of();
    }

    // treating the ints as a data collection, just like the C#
    final Integer pivot = ints.get(0);

    // Using streams to get the two partitions
    List<Integer> lt = ints.stream().filter(i -> i < pivot).collect(Collectors.toList());
    List<Integer> gt = ints.stream().filter(i -> i > pivot).collect(Collectors.toList());

    return Stream.concat(Stream.concat(quickSort(lt), Stream.of(pivot)),quickSort(gt));
}    

Note there is a bug (which I have reproduced), in that the sort does not handle duplicate values gracefully, it is a 'unique value' sort.

Also note how the Java code uses data source (List), and stream concepts at different point, and that in C# those two 'personalities' can be expressed in just IEnumerable. Also, although I have use List as the base type, I could have used the more general Collection, and with a small iterator-to-Stream conversion, I could have used the even more general Iterable

Solution 3 - Java

Streams are built around Spliterators which are stateful, mutable objects. They don’t have a “reset” action and in fact, requiring to support such rewind action would “take away much power”. How would Random.ints() be supposed to handle such a request?

On the other hand, for Streams which have a retraceable origin, it is easy to construct an equivalent Stream to be used again. Just put the steps made to construct the Stream into a reusable method. Keep in mind that repeating these steps is not an expensive operation as all these steps are lazy operations; the actual work starts with the terminal operation and depending on the actual terminal operation entirely different code might get executed.

It would be up to you, the writer of such a method, to specify what calling the method twice implies: does it reproduce exactly the same sequence, as streams created for an unmodified array or collection do, or does it produce a stream with a similar semantics but different elements like a stream of random ints or a stream of console input lines, etc.


By the way, to avoid confusion, a terminal operation consumes the Stream which is distinct from closing the Stream as calling close() on the stream does (which is required for streams having associated resources like, e.g. produced by Files.lines()).


It seems that a lot of confusion stems from misguiding comparison of IEnumerable with Stream. An IEnumerable represents the ability to provide an actual IEnumerator, so its like an Iterable in Java. In contrast, a Stream is a kind of iterator and comparable to an IEnumerator so it’s wrong to claim that this kind of data type can be used multiple times in .NET, the support for IEnumerator.Reset is optional. The examples discussed here rather use the fact that an IEnumerable can be used to fetch new IEnumerators and that works with Java’s Collections as well; you can get a new Stream. If the Java developers decided to add the Stream operations to Iterable directly, with intermediate operations returning another Iterable, it was really comparable and it could work the same way.

However, the developers decided against it and the decision is discussed in this question. The biggest point is the confusion about eager Collection operations and lazy Stream operations. By looking at the .NET API, I (yes, personally) find it justified. While it looks reasonable looking at IEnumerable alone, a particular Collection will have lots of methods manipulating the Collection directly and lots of methods returning a lazy IEnumerable, while the particular nature of a method isn’t always intuitively recognizable. The worst example I found (within the few minutes I looked at it) is List.Reverse() whose name matches exactly the name of the inherited (is this the right terminus for extension methods?) Enumerable.Reverse() while having an entirely contradicting behavior.


Of course, these are two distinct decisions. The first one to make Stream a type distinct from Iterable/Collection and the second to make Stream a kind of one time iterator rather than another kind of iterable. But these decision were made together and it might be the case that separating these two decision never was considered. It wasn’t created with being comparable to .NET’s in mind.

The actual API design decision was to add an improved type of iterator, the Spliterator. Spliterators can be provided by the old Iterables (which is the way how these were retrofitted) or entirely new implementations. Then, Stream was added as a high-level front-end to the rather low level Spliterators. That’s it. You may discuss about whether a different design would be better, but that’s not productive, it won’t change, given the way they are designed now.

There is another implementation aspect you have to consider. Streams are not immutable data structures. Each intermediate operation may return a new Stream instance encapsulating the old one but it may also manipulate its own instance instead and return itself (that doesn’t preclude doing even both for the same operation). Commonly known examples are operations like parallel or unordered which do not add another step but manipulate the entire pipeline). Having such a mutable data structure and attempts to reuse (or even worse, using it multiple times at the same time) doesn’t play well…


For completeness, here is your quicksort example translated to the Java Stream API. It shows that it does not really “take away much power”.

static Stream<Integer> quickSort(Supplier<Stream<Integer>> ints) {

  final Optional<Integer> optPivot = ints.get().findAny();
  if(!optPivot.isPresent()) return Stream.empty();

  final int pivot = optPivot.get();

  Supplier<Stream<Integer>> lt = ()->ints.get().filter(i -> i < pivot);
  Supplier<Stream<Integer>> gt = ()->ints.get().filter(i -> i > pivot);

  return Stream.of(quickSort(lt), Stream.of(pivot), quickSort(gt)).flatMap(s->s);
}

It can be used like

List<Integer> l=new Random().ints(100, 0, 1000).boxed().collect(Collectors.toList());
System.out.println(l);
System.out.println(quickSort(l::stream)
    .map(Object::toString).collect(Collectors.joining(", ")));

You can write it even more compact as

static Stream<Integer> quickSort(Supplier<Stream<Integer>> ints) {
    return ints.get().findAny().map(pivot ->
         Stream.of(
                   quickSort(()->ints.get().filter(i -> i < pivot)),
                   Stream.of(pivot),
                   quickSort(()->ints.get().filter(i -> i > pivot)))
        .flatMap(s->s)).orElse(Stream.empty());
}

Solution 4 - Java

I think there are very few differences between the two when you look closely enough.

At it's face, an IEnumerable does appear to be a reusable construct:

IEnumerable<int> numbers = new int[] { 1, 2, 3, 4, 5 };

foreach (var n in numbers) {
    Console.WriteLine(n);
}

However, the compiler is actually doing a little bit of work to help us out; it generates the following code:

IEnumerable<int> numbers = new int[] { 1, 2, 3, 4, 5 };

IEnumerator<int> enumerator = numbers.GetEnumerator();
while (enumerator.MoveNext()) {
    Console.WriteLine(enumerator.Current);
}

Each time you would actually iterate over the enumerable, the compiler creates an enumerator. The enumerator is not reusable; further calls to MoveNext will just return false, and there is no way to reset it to the beginning. If you want to iterate over the numbers again, you will need to create another enumerator instance.


To better illustrate that the IEnumerable has (can have) the same 'feature' as a Java Stream, consider a enumerable whose source of the numbers is not a static collection. For example, we can create an enumerable object which generates a sequence of 5 random numbers:

class Generator : IEnumerator<int> {
    Random _r;
    int _current;
    int _count = 0;

    public Generator(Random r) {
        _r = r;
    }

    public bool MoveNext() {
        _current= _r.Next();
        _count++;
        return _count <= 5;
    }

    public int Current {
        get { return _current; }
    }
 }

class RandomNumberStream : IEnumerable<int> {
    Random _r = new Random();
    public IEnumerator<int> GetEnumerator() {
        return new Generator(_r);
    }
    public IEnumerator IEnumerable.GetEnumerator() {
        return this.GetEnumerator();
    }
}

Now we have very similar code to the previous array-based enumerable, but with a second iteration over numbers:

IEnumerable<int> numbers = new RandomNumberStream();

foreach (var n in numbers) {
    Console.WriteLine(n);
}
foreach (var n in numbers) {
    Console.WriteLine(n);
}

The second time we iterate over numbers we will get a different sequence of numbers, which isn't reusable in the same sense. Or, we could have written the RandomNumberStream to thrown an exception if you try to iterate over it multiple times, making the enumerable actually unusable (like a Java Stream).

Also, what does your enumerable-based quick sort mean when applied to a RandomNumberStream?


Conclusion

So, the biggest difference is that .NET allows you to reuse an IEnumerable by implicitly creating a new IEnumerator in the background whenever it would need to access elements in the sequence.

This implicit behavior is often useful (and 'powerful' as you state), because we can repeatedly iterate over a collection.

But sometimes, this implicit behavior can actually cause problems. If your data source is not static, or is costly to access (like a database or web site), then a lot of assumptions about IEnumerable have to be discarded; reuse is not that straight-forward

Solution 5 - Java

It is possible to bypass some of the "run once" protections in the Stream API; for example we can avoid java.lang.IllegalStateException exceptions (with message "stream has already been operated upon or closed") by referencing and reusing the Spliterator (rather than the Stream directly).

For example, this code will run without throwing an exception:

    Spliterator<String> split = Stream.of("hello","world")
								      .map(s->"prefix-"+s)
                                      .spliterator();

	Stream<String> replayable1 = StreamSupport.stream(split,false);
	Stream<String> replayable2 = StreamSupport.stream(split,false);


	replayable1.forEach(System.out::println);
	replayable2.forEach(System.out::println);

However the output will be limited to

prefix-hello
prefix-world

rather than repeating the output twice. This is because the ArraySpliterator used as the Stream source is stateful and stores its current position. When we replay this Stream we start again at the end.

We have a number of options to solve this challenge:

  1. We could make use of a stateless Stream creation method such as Stream#generate(). We would have to manage state externally in our own code and reset between Stream "replays":

     Spliterator<String> split = Stream.generate(this::nextValue)
     							      .map(s->"prefix-"+s)
                                       .spliterator();
    
     Stream<String> replayable1 = StreamSupport.stream(split,false);
     Stream<String> replayable2 = StreamSupport.stream(split,false);
    
    
     replayable1.forEach(System.out::println);
     this.resetCounter();
     replayable2.forEach(System.out::println);
    
  2. Another (slightly better but not perfect) solution to this is to write our own ArraySpliterator (or similar Stream source) that includes some capacity to reset the current counter. If we were to use it to generate the Stream we could potentially replay them successfully.

     MyArraySpliterator<String> arraySplit = new MyArraySpliterator("hello","world");
     Spliterator<String> split = StreamSupport.stream(arraySplit,false)
     							            .map(s->"prefix-"+s)
                                             .spliterator();
    
     Stream<String> replayable1 = StreamSupport.stream(split,false);
     Stream<String> replayable2 = StreamSupport.stream(split,false);
    
    
     replayable1.forEach(System.out::println);
     arraySplit.reset();
     replayable2.forEach(System.out::println);
    
  3. The best solution to this problem (in my opinion) is to make a new copy of any stateful Spliterators used in the Stream pipeline when new operators are invoked on the Stream. This is more complex and involved to implement, but if you don't mind using third party libraries, cyclops-react has a Stream implementation that does exactly this. (Disclosure: I am the lead developer for this project.)

     Stream<String> replayableStream = ReactiveSeq.of("hello","world")
     							                 .map(s->"prefix-"+s);
    
     
    
    
     replayableStream.forEach(System.out::println);
     replayableStream.forEach(System.out::println);
    

This will print

prefix-hello
prefix-world
prefix-hello
prefix-world

as expected.

Solution 6 - Java

The reason is that you can create streams from things that can only be used once by definition, such as an Iterator or a BufferedReader. You can think of a Stream as being consumed the same way as having used a BufferedReader to read a text file to its end. Once you reach the end of the file, the BufferedReader doesn't stop existing, but it just become useless as you can't get anything out of it anymore. If you want to read the file again, you have to create a new reader. The same goes for streams. If you want to process the source of the stream twice, you have to create two separate streams.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionVitaliyView Question on Stackoverflow
Solution 1 - JavaStuart MarksView Answer on Stackoverflow
Solution 2 - JavarolflView Answer on Stackoverflow
Solution 3 - JavaHolgerView Answer on Stackoverflow
Solution 4 - JavaAndrew VermieView Answer on Stackoverflow
Solution 5 - JavaJohn McCleanView Answer on Stackoverflow
Solution 6 - Javaamer_SalahView Answer on Stackoverflow