When should I use streams?

JavaJava 8Java Stream

Java Problem Overview


I just came across a question when using a List and its stream() method. While I know how to use them, I'm not quite sure about when to use them.

For example, I have a list, containing various paths to different locations. Now, I'd like to check whether a single, given path contains any of the paths specified in the list. I'd like to return a boolean based on whether or not the condition was met.

This of course, is not a hard task per se. But I wonder whether I should use streams, or a for(-each) loop.

The List

private static final List<String> EXCLUDE_PATHS = Arrays.asList(
    "my/path/one",
    "my/path/two"
);

Example using Stream:

private boolean isExcluded(String path) {
    return EXCLUDE_PATHS.stream()
                        .map(String::toLowerCase)
                        .filter(path::contains)
                        .collect(Collectors.toList())
                        .size() > 0;
}

Example using for-each loop:

private boolean isExcluded(String path){
    for (String excludePath : EXCLUDE_PATHS) {
        if (path.contains(excludePath.toLowerCase())) {
            return true;
        }
    }
    return false;
}

Note that the path parameter is always lowercase.

My first guess is that the for-each approach is faster, because the loop would return immediately, if the condition is met. Whereas the stream would still loop over all list entries in order to complete filtering.

Is my assumption correct? If so, why (or rather when) would I use stream() then?

Java Solutions


Solution 1 - Java

Your assumption is correct. Your stream implementation is slower than the for-loop.

This stream usage should be as fast as the for-loop though:

EXCLUDE_PATHS.stream()  
    .map(String::toLowerCase)
    .anyMatch(path::contains);

This iterates through the items, applying String::toLowerCase and the filter to the items one-by-one and terminating at the first item that matches.

Both collect() & anyMatch() are terminal operations. anyMatch() exits at the first found item, though, while collect() requires all items to be processed.

Solution 2 - Java

The decision whether to use Streams or not should not be driven by performance consideration, but rather by readability. When it really comes to performance, there are other considerations.

With your .filter(path::contains).collect(Collectors.toList()).size() > 0 approach, you are processing all elements and collecting them into a temporary List, before comparing the size, still, this hardly ever matters for a Stream consisting of two elements.

Using .map(String::toLowerCase).anyMatch(path::contains) can save CPU cycles and memory, if you have a substantially larger number of elements. Still, this converts each String to its lowercase representation, until a match is found. Obviously, there is a point in using

private static final List<String> EXCLUDE_PATHS =
    Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
          .collect(Collectors.toList());

private boolean isExcluded(String path) {
    return EXCLUDE_PATHS.stream().anyMatch(path::contains);
}

instead. So you don’t have to repeat the conversion to lowcase in every invocation of isExcluded. If the number of elements in EXCLUDE_PATHS or the lengths of the strings becomes really large, you may consider using

private static final List<Predicate<String>> EXCLUDE_PATHS =
    Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
          .map(s -> Pattern.compile(s, Pattern.LITERAL).asPredicate())
          .collect(Collectors.toList());

private boolean isExcluded(String path){
    return EXCLUDE_PATHS.stream().anyMatch(p -> p.test(path));
}

Compiling a string as regex pattern with the LITERAL flag, makes it behave just like ordinary string operations, but allows the engine to spent some time in preparation, e.g. using the Boyer Moore algorithm, to be more efficient when it comes to the actual comparison.

Of course, this only pays off if there are enough subsequent tests to compensate the time spent in preparation. Determining whether this will be the case, is one of the actual performance considerations, besides the first question whether this operation will ever be performance critical at all. Not the question whether to use Streams or for loops.

By the way, the code examples above keep the logic of your original code, which looks questionable to me. Your isExcluded method returns true, if the specified path contains any of the elements in list, so it returns true for /some/prefix/to/my/path/one, as well as my/path/one/and/some/suffix or even /some/prefix/to/my/path/one/and/some/suffix.

Even dummy/path/onerous is considered fulfilling the criteria as it contains the string my/path/one

Solution 3 - Java

Yeah. You are right. Your stream approach will have some overhead. But you may use such a construction:

private boolean isExcluded(String path) {
    return  EXCLUDE_PATHS.stream().map(String::toLowerCase).anyMatch(path::contains);
}

The main reason to use streams is that they make your code simpler and easy to read.

Solution 4 - Java

The goal of streams in Java is to simplify the complexity of writing parallel code. It's inspired by functional programming. The serial stream is just to make the code cleaner.

If we want performance we should use parallelStream, which was designed to. The serial one, in general, is slower.

There is a good article to read about ForLoop, Stream and ParallelStream Performance.

In your code we can use termination methods to stop the search on the first match. (anyMatch...)

Solution 5 - Java

As others have mentioned many good points, but I just want to mention lazy evaluation in stream evaluation. When we do map() to create a stream of lower case paths, we are not creating the whole stream immediately, instead the stream is lazily constructed, which is why the performance should be equivalent to the traditional for loop. It is not doing a full scanning, map() and anyMatch() are executed at the same time. Once anyMatch() returns true, it will be short-circuited.

Solution 6 - Java

Radical answer:

Never. Ever. Ever.

I almost never iterated a list for anything, especially to find something, yet Stream users and systems seem filled with that way of coding.

Difficult to refactor and organize such code and so redundancy and over iteration is everywhere. In the same method you might see it 5 times. Same list, finding different things.

It is also not really shorter either. Rarely is. Definitely not more readable but that is a subjective opinion. Some people will say it is. I don't. People might like it due to autocompletion but in my editor Intellij, I can just iter or itar and have the for loop auto created for me with types and everything.

Misused and overused and it is better to avoid completely. Java is not a true functional language and Java generics sucks and are not expressive enough and certainly more difficult to read, parse and refactor.

Stream code is not easily extractable or refactorable unless you want to start adding weird methods that return Optionals, Predicates, Consumers and what not and you end up having methods returning and taking all kinds of weird generic constraints with orders and meanings only God knows what. To much inferrals where you need to visit methods to figure out the types of various things.

Trying to make Java behave like a functional language like Haskell or Lisp is a fools errand. A heavy Streams based Java system is always going to be more complex than a none one, and way less performant and more complex to refactor and maintain. Therefore also more buggy and filled with patch work. Glue everywhere.

When OpenJDK got involved they started adding things to the language without really thinking it thoroughly. It is not just the Java Streams. Therefore such systems are inherently more complex because they require more base knowledge. You might have it, but your colleagues don't. They sure as hell know what a for loop is and what an if block is.

Furthermore, since you also can not assign anything to a non final variable, you can rarely do two things at the same while looping, so you end up iterating twice, or thrice.

Most that like and prefer the Streams approach over a for loop are people that started learning Java post Java 8. Those before hate it. The thing is that it is far more complex use, refactor and more difficult to use the right way.

And when I say it performs worse, it is not in comparison to a for loop, which is also a very real thing but more due to the tendency such code have to over iterate a wide range of things.

It is deemed easy to iterate a list to find an item that it tends being done over and over again.

I've not seen a single system that has benefitted from it. All of the systems I have seen are horribly implemented, mostly because of it.

Code is definitely not more readable than a for loop and definitely more flexible and refactorable in a for loop. The reason we see so many such complex crap systems and bugs everywhere today is I promise you due to the heavy reliance on Streams to filter, not to mention overuse of Lombok and Jackson. Those three are the hallmark of a badly implemented system. Keyword overuse. A patch work approach.

Again, I consider it really bad to iterate a list to find anything. Yet with Stream based systems, this is what people do all the time. It is also not rare and difficult to parse and detect that an iteration might be O(N2) while with a for loop you would immediately see it.

What is often customary to ask the database to filter things for you it is not rare that a base query would instead return a big list of things and expanded with all kinds of iterative things and methods to cover use cases to filter out undesirables, and of course they use Streams to do that. All kinds of methods arises around that big list with various things to filter.

Over and over again. Of course, I do not mean you. Your colleagues. Right?

I almost never iterate anything. I use the right datasets and rely on the database to filter it for me. Once. However in a Streams heavy system you will see this everywhere. In the deepest method, in the caller, caller of caller, caller of the caller of the caller. Streams everywhere. It is ugly. And good luck refactoring that code that lives in tiny lambdas.

And good luck reusing them. Nobody will look to reuse your nice Predicates. And if they want to use them, guess what. They need to use more Streams. You just got yourself addicted and cornered yourself further. Now, are you proposing I start splitting all of my code in tiny Predicates, Consumers, Function and BiFcuntions? Just so I can reuse that logic for Streams?

Of course I hate it just as much in Javascript as well, where over iteration is everywhere.

You might say the cost is nothing to iterate a list but the system complexity grows, redundancy increases and therefore maintenance costs and number of bugs increases. It becomes a patch and glue based approach to various things. Just add another filter and remove this, rather than code things the right way.

Furthermore, where you need three servers to host all of your users, I can manage with just one. So required scalability of such a system is going to be required way earlier than a non streams heavy system. For small projects that is a very important metric. Where you can have say 5000 concurrent users, my system can handle twice or thrice that.

I have no need for it in my code, and when I am in charge of new projects, the first rule is that Streams are totally forbidden to use.

That is not to say there are not use cases for it, or that it might be useful at times but the risks associated with allowing it far outweighs the benefits.

When you start using Streams you are essentially adopting a whole new programming paradigm. The entire programming style of the system will change and that is what I am concerned about.

You do not want that style. It is not superior to the old style. Especially on Java.

Take the Futures API. Sure, you could start coding everything to return a Promise or a Future, but do you really want to? Is that going to resolve anything? Can your entire system really follow up on being that, everywhere? Will it be better for you, or are you just experimenting and hoping you will benefit at some point?

There are people that overdo JavaRx and overdo promises in JavaScript as well. There are really really few cases for when you really want to have things futures based, and very many many corner cases will be felt where you will find that those APIs have certain limitations and you just got made.

You can build really really complex and far far more maintainable systems without all that crap. This is what it is about. It is not about your hobby project expanding and becoming a horrible code base.

It is about what is best approach to build large and complex enterprise systems and ensure they remain coherent, consistent refactorable and easily maintainable.

Furthermore, rarely are you ever working on such systems on your own. You are very likely working with a minimum of > 10 people all experimenting and overdoing Streams. So while you might know how to use them properly, you can rest assure the other 9 don't.

I will leave you with these wonderful examples of real code, with thousands more like them:

enter image description here

Or this:

enter image description here

Or this:

enter image description here

Or this:

enter image description here

Try refactoring the above. I challenge you. Give it a try. Everything is a Stream, everywhere. This is what Stream developers do, they overdo it, and there is no easy way to grasp what the code is actually doing. What is this method returning, what is this transformation doing, what do I end up with. Everything is inferred. Much more difficult to read for sure.

If you understand this, then you must be the einstein, but you should know not everyone is like you, and this could be your system in a very near future.

Do note, this is not isolated to this one project but I've seen many of them very similar to these structures.

One thing is for sure, horrible coders love streams.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmcuenezView Question on Stackoverflow
Solution 1 - JavaStefan PriesView Answer on Stackoverflow
Solution 2 - JavaHolgerView Answer on Stackoverflow
Solution 3 - Javarvit34View Answer on Stackoverflow
Solution 4 - JavaPaulo Ricardo AlmeidaView Answer on Stackoverflow
Solution 5 - JavaLycheeSojuYYDSView Answer on Stackoverflow
Solution 6 - JavamjsView Answer on Stackoverflow