Java 8 Distinct by property

JavaCollectionsJava 8Java StreamDistinct Values

Java Problem Overview


In Java 8 how can I filter a collection using the Stream API by checking the distinctness of a property of each object?

For example I have a list of Person object and I want to remove people with the same name,

persons.stream().distinct();

Will use the default equality check for a Person object, so I need something like,

persons.stream().distinct(p -> p.getName());

Unfortunately the distinct() method has no such overload. Without modifying the equality check inside the Person class is it possible to do this succinctly?

Java Solutions


Solution 1 - Java

Consider distinct to be a stateful filter. Here is a function that returns a predicate that maintains state about what it's seen previously, and that returns whether the given element was seen for the first time:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

Then you can write:

persons.stream().filter(distinctByKey(Person::getName))

Note that if the stream is ordered and is run in parallel, this will preserve an arbitrary element from among the duplicates, instead of the first one, as distinct() does.

(This is essentially the same as my answer to this question: https://stackoverflow.com/questions/27870136/java-lambda-stream-distinct-on-arbitrary-key)

Solution 2 - Java

An alternative would be to place the persons in a map using the name as a key:

persons.collect(Collectors.toMap(Person::getName, p -> p, (p, q) -> p)).values();

Note that the Person that is kept, in case of a duplicate name, will be the first encontered.

Solution 3 - Java

You can wrap the person objects into another class, that only compares the names of the persons. Afterward, you unwrap the wrapped objects to get a person stream again. The stream operations might look as follows:

persons.stream()
    .map(Wrapper::new)
    .distinct()
    .map(Wrapper::unwrap)
    ...;

The class Wrapper might look as follows:

class Wrapper {
    private final Person person;
    public Wrapper(Person person) {
        this.person = person;
    }
    public Person unwrap() {
        return person;
    }
    public boolean equals(Object other) {
        if (other instanceof Wrapper) {
            return ((Wrapper) other).person.getName().equals(person.getName());
        } else {
            return false;
        }
    }
    public int hashCode() {
        return person.getName().hashCode();
    }
}

Solution 4 - Java

Another solution, using Set. May not be the ideal solution, but it works

Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());

Or if you can modify the original list, you can use removeIf method

persons.removeIf(p -> !set.add(p.getName()));

Solution 5 - Java

There's a simpler approach using a TreeSet with a custom comparator.

persons.stream()
    .collect(Collectors.toCollection(
      () -> new TreeSet<Person>((p1, p2) -> p1.getName().compareTo(p2.getName())) 
));

Solution 6 - Java

We can also use RxJava (very powerful reactive extension library)

Observable.from(persons).distinct(Person::getName)

or

Observable.from(persons).distinct(p -> p.getName())

Solution 7 - Java

You can use groupingBy collector:

persons.collect(Collectors.groupingBy(p -> p.getName())).values().forEach(t -> System.out.println(t.get(0).getId()));

If you want to have another stream you can use this:

persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream().map(l -> (l.get(0)));

Solution 8 - Java

You can use the distinct(HashingStrategy) method in Eclipse Collections.

List<Person> persons = ...;
MutableList<Person> distinct =
    ListIterate.distinct(persons, HashingStrategies.fromFunction(Person::getName));

If you can refactor persons to implement an Eclipse Collections interface, you can call the method directly on the list.

MutableList<Person> persons = ...;
MutableList<Person> distinct =
    persons.distinct(HashingStrategies.fromFunction(Person::getName));

HashingStrategy is simply a strategy interface that allows you to define custom implementations of equals and hashcode.

public interface HashingStrategy<E>
{
    int computeHashCode(E object);
    boolean equals(E object1, E object2);
}

Note: I am a committer for Eclipse Collections.

Solution 9 - Java

Similar approach which Saeed Zarinfam used but more Java 8 style:)

persons.collect(Collectors.groupingBy(p -> p.getName())).values().stream()
 .map(plans -> plans.stream().findFirst().get())
 .collect(toList());

Solution 10 - Java

I recommend using Vavr, if you can. With this library you can do the following:

io.vavr.collection.List.ofAll(persons)
                       .distinctBy(Person::getName)
                       .toJavaSet() // or any another Java 8 Collection

Solution 11 - Java

You can use StreamEx library:

StreamEx.of(persons)
        .distinct(Person::getName)
        .toList()

Solution 12 - Java

Extending Stuart Marks's answer, this can be done in a shorter way and without a concurrent map (if you don't need parallel streams):

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    final Set<Object> seen = new HashSet<>();
    return t -> seen.add(keyExtractor.apply(t));
}

Then call:

persons.stream().filter(distinctByKey(p -> p.getName());

Solution 13 - Java

I made a generic version:

private <T, R> Collector<T, ?, Stream<T>> distinctByKey(Function<T, R> keyExtractor) {
    return Collectors.collectingAndThen(
            toMap(
                    keyExtractor,
                    t -> t,
                    (t1, t2) -> t1
            ),
            (Map<R, T> map) -> map.values().stream()
    );
}

An exemple:

Stream.of(new Person("Jean"), 
          new Person("Jean"),
          new Person("Paul")
)
    .filter(...)
    .collect(distinctByKey(Person::getName)) // return a stream of Person with 2 elements, jean and Paul
    .map(...)
    .collect(toList())
    

Solution 14 - Java

Distinct objects list can be found using:

 List distinctPersons = persons.stream()
    				.collect(Collectors.collectingAndThen(
    						Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Person:: getName))),
    						ArrayList::new));

Solution 15 - Java

Another library that supports this is jOOλ, and its Seq.distinct(Function<T,U>) method:

Seq.seq(persons).distinct(Person::getName).toList();

Under the hood, it does practically the same thing as the accepted answer, though.

Solution 16 - Java

Set<YourPropertyType> set = new HashSet<>();
list
        .stream()
        .filter(it -> set.add(it.getYourProperty()))
        .forEach(it -> ...);

Solution 17 - Java

My approach to this is to group all the objects with same property together, then cut short the groups to size of 1 and then finally collect them as a List.

  List<YourPersonClass> listWithDistinctPersons =   persons.stream()
            //operators to remove duplicates based on person name
            .collect(Collectors.groupingBy(p -> p.getName()))
            .values()
            .stream()
            //cut short the groups to size of 1
            .flatMap(group -> group.stream().limit(1))
            //collect distinct users as list
            .collect(Collectors.toList());

Solution 18 - Java

While the highest upvoted answer is absolutely best answer wrt Java 8, it is at the same time absolutely worst in terms of performance. If you really want a bad low performant application, then go ahead and use it. Simple requirement of extracting a unique set of Person Names shall be achieved by mere "For-Each" and a "Set". Things get even worse if list is above size of 10.

Consider you have a collection of 20 Objects, like this:

public static final List<SimpleEvent> testList = Arrays.asList(
            new SimpleEvent("Tom"), new SimpleEvent("Dick"),new SimpleEvent("Harry"),new SimpleEvent("Tom"),
            new SimpleEvent("Dick"),new SimpleEvent("Huckle"),new SimpleEvent("Berry"),new SimpleEvent("Tom"),
            new SimpleEvent("Dick"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("Cherry"),
            new SimpleEvent("Roses"),new SimpleEvent("Moses"),new SimpleEvent("Chiku"),new SimpleEvent("gotya"),
            new SimpleEvent("Gotye"),new SimpleEvent("Nibble"),new SimpleEvent("Berry"),new SimpleEvent("Jibble"));

Where you object SimpleEvent looks like this:

public class SimpleEvent {

private String name;
private String type;

public SimpleEvent(String name) {
    this.name = name;
    this.type = "type_"+name;
}

public String getName() {
    return name;
}

public void setName(String name) {
    this.name = name;
}

public String getType() {
    return type;
}

public void setType(String type) {
    this.type = type;
}
}

And to test, you have JMH code like this,(Please note, im using the same distinctByKey Predicate mentioned in accepted answer) :

@Benchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void aStreamBasedUniqueSet(Blackhole blackhole) throws Exception{

    Set<String> uniqueNames = testList
            .stream()
            .filter(distinctByKey(SimpleEvent::getName))
            .map(SimpleEvent::getName)
            .collect(Collectors.toSet());
    blackhole.consume(uniqueNames);
}

@Benchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void aForEachBasedUniqueSet(Blackhole blackhole) throws Exception{
    Set<String> uniqueNames = new HashSet<>();

    for (SimpleEvent event : testList) {
        uniqueNames.add(event.getName());
    }
    blackhole.consume(uniqueNames);
}

public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder()
            .include(MyBenchmark.class.getSimpleName())
            .forks(1)
            .mode(Mode.Throughput)
            .warmupBatchSize(3)
            .warmupIterations(3)
            .measurementIterations(3)
            .build();

    new Runner(opt).run();
}

Then you'll have Benchmark results like this:

Benchmark                                  Mode  Samples        Score  Score error  Units
c.s.MyBenchmark.aForEachBasedUniqueSet    thrpt        3  2635199.952  1663320.718  ops/s
c.s.MyBenchmark.aStreamBasedUniqueSet     thrpt        3   729134.695   895825.697  ops/s

And as you can see, a simple For-Each is 3 times better in throughput and less in error score as compared to Java 8 Stream.

Higher the throughput, better the performance

Solution 19 - Java

This works like a charm:

  1. Grouping the data by unique key to form a map.
  2. Returning the first object from every value of the map (There could be multiple person having same name).
persons.stream()
    .collect(groupingBy(Person::getName))
	.values()
	.stream()
	.flatMap(values -> values.stream().limit(1))
	.collect(toList());

Solution 20 - Java

The easiest way to implement this is to jump on the sort feature as it already provides an optional Comparator which can be created using an element’s property. Then you have to filter duplicates out which can be done using a statefull Predicate which uses the fact that for a sorted stream all equal elements are adjacent:

Comparator<Person> c=Comparator.comparing(Person::getName);
stream.sorted(c).filter(new Predicate<Person>() {
    Person previous;
    public boolean test(Person p) {
      if(previous!=null && c.compare(previous, p)==0)
        return false;
      previous=p;
      return true;
    }
})./* more stream operations here */;

Of course, a statefull Predicate is not thread-safe, however if that’s your need you can move this logic into a Collector and let the stream take care of the thread-safety when using your Collector. This depends on what you want to do with the stream of distinct elements which you didn’t tell us in your question.

Solution 21 - Java

I would like to improve Stuart Marks answer. What if the key is null, it will through NullPointerException. Here I ignore the null key by adding one more check as keyExtractor.apply(t)!=null.

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> keyExtractor.apply(t)!=null && seen.add(keyExtractor.apply(t));

}

Solution 22 - Java

Here is the example
public class PayRoll {
	
	private int payRollId;
	private int id;
	private String name;
	private String dept;
	private int salary;
	
	
	public PayRoll(int payRollId, int id, String name, String dept, int salary) {
		super();
		this.payRollId = payRollId;
		this.id = id;
		this.name = name;
		this.dept = dept;
		this.salary = salary;
	}
} 

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collector;
import java.util.stream.Collectors;

public class Prac {
	public static void main(String[] args) {

		int salary=70000;
		PayRoll payRoll=new PayRoll(1311, 1, "A", "HR", salary);
		PayRoll payRoll2=new PayRoll(1411, 2	, "B", "Technical", salary);
		PayRoll payRoll3=new PayRoll(1511, 1, "C", "HR", salary);
		PayRoll payRoll4=new PayRoll(1611, 1, "D", "Technical", salary);
		PayRoll payRoll5=new PayRoll(711, 3,"E", "Technical", salary);
		PayRoll payRoll6=new PayRoll(1811, 3, "F", "Technical", salary);
		List<PayRoll>list=new ArrayList<PayRoll>();
		list.add(payRoll);
		list.add(payRoll2);
		list.add(payRoll3);
		list.add(payRoll4);
		list.add(payRoll5);
		list.add(payRoll6);


		Map<Object, Optional<PayRoll>> k = list.stream().collect(Collectors.groupingBy(p->p.getId()+"|"+p.getDept(),Collectors.maxBy(Comparator.comparingInt(PayRoll::getPayRollId))));


		k.entrySet().forEach(p->
		{
			if(p.getValue().isPresent())
			{
				System.out.println(p.getValue().get());
			}
		});


		
	}
}

Output:

PayRoll [payRollId=1611, id=1, name=D, dept=Technical, salary=70000]
PayRoll [payRollId=1811, id=3, name=F, dept=Technical, salary=70000]
PayRoll [payRollId=1411, id=2, name=B, dept=Technical, salary=70000]
PayRoll [payRollId=1511, id=1, name=C, dept=HR, salary=70000]

Solution 23 - Java

Late to the party but I sometimes use this one-liner as an equivalent:

((Function<Value, Key>) Value::getKey).andThen(new HashSet<>()::add)::apply

The expression is a Predicate<Value> but since the map is inline, it works as a filter. This is of course less readable but sometimes it can be helpful to avoid the method.

Solution 24 - Java

Building on @josketres's answer, I created a generic utility method:

You could make this more Java 8-friendly by creating a Collector.

public static <T> Set<T> removeDuplicates(Collection<T> input, Comparator<T> comparer) {
    return input.stream()
            .collect(toCollection(() -> new TreeSet<>(comparer)));
}


@Test
public void removeDuplicatesWithDuplicates() {
    ArrayList<C> input = new ArrayList<>();
    Collections.addAll(input, new C(7), new C(42), new C(42));
    Collection<C> result = removeDuplicates(input, (c1, c2) -> Integer.compare(c1.value, c2.value));
    assertEquals(2, result.size());
    assertTrue(result.stream().anyMatch(c -> c.value == 7));
    assertTrue(result.stream().anyMatch(c -> c.value == 42));
}

@Test
public void removeDuplicatesWithoutDuplicates() {
    ArrayList<C> input = new ArrayList<>();
    Collections.addAll(input, new C(1), new C(2), new C(3));
    Collection<C> result = removeDuplicates(input, (t1, t2) -> Integer.compare(t1.value, t2.value));
    assertEquals(3, result.size());
    assertTrue(result.stream().anyMatch(c -> c.value == 1));
    assertTrue(result.stream().anyMatch(c -> c.value == 2));
    assertTrue(result.stream().anyMatch(c -> c.value == 3));
}

private class C {
    public final int value;

    private C(int value) {
        this.value = value;
    }
}

Solution 25 - Java

Maybe will be useful for somebody. I had a little bit another requirement. Having list of objects A from 3rd party remove all which have same A.b field for same A.id (multiple A object with same A.id in list). [Stream partition][1] answer by [Tagir Valeev][2] inspired me to use custom Collector which returns Map<A.id, List<A>>. Simple flatMap will do the rest.

 public static <T, K, K2> Collector<T, ?, Map<K, List<T>>> groupingDistinctBy(Function<T, K> keyFunction, Function<T, K2> distinctFunction) {
    return groupingBy(keyFunction, Collector.of((Supplier<Map<K2, T>>) HashMap::new,
            (map, error) -> map.putIfAbsent(distinctFunction.apply(error), error),
            (left, right) -> {
                left.putAll(right);
                return left;
            }, map -> new ArrayList<>(map.values()),
            Collector.Characteristics.UNORDERED)); }

[1]: https://stackoverflow.com/a/32435407/4899609 "Stream partition" [2]: https://stackoverflow.com/users/4856258/tagir-valeev "Tagir Valeev"

Solution 26 - Java

I had a situation, where I was suppose to get distinct elements from list based on 2 keys. If you want distinct based on two keys or may composite key, try this

class Person{
	int rollno;
	String name;
}
List<Person> personList;


Function<Person, List<Object>> compositeKey = personList->
		Arrays.<Object>asList(personList.getName(), personList.getRollno());

Map<Object, List<Person>> map = personList.stream().collect(Collectors.groupingBy(compositeKey, Collectors.toList()));

List<Object> duplicateEntrys = map.entrySet().stream()`enter code here`
		.filter(settingMap ->
				settingMap.getValue().size() > 1)
		.collect(Collectors.toList());

Solution 27 - Java

A variation of the top answer that handles null:

    public static <T, K> Predicate<T> distinctBy(final Function<? super T, K> getKey) {
        val seen = ConcurrentHashMap.<Optional<K>>newKeySet();
        return obj -> seen.add(Optional.ofNullable(getKey.apply(obj)));
    }

In my tests:

        assertEquals(
                asList("a", "bb"),
                Stream.of("a", "b", "bb", "aa").filter(distinctBy(String::length)).collect(toList()));

        assertEquals(
                asList(5, null, 2, 3),
                Stream.of(5, null, 2, null, 3, 3, 2).filter(distinctBy(x -> x)).collect(toList()));

        val maps = asList(
                hashMapWith(0, 2),
                hashMapWith(1, 2),
                hashMapWith(2, null),
                hashMapWith(3, 1),
                hashMapWith(4, null),
                hashMapWith(5, 2));

        assertEquals(
                asList(0, 2, 3),
                maps.stream()
                        .filter(distinctBy(m -> m.get("val")))
                        .map(m -> m.get("i"))
                        .collect(toList()));

Solution 28 - Java

There are lot of approaches, this one will also help

    List<Employee> employees = new ArrayList<>();

    employees.add(new Employee(11, "Ravi"));
    employees.add(new Employee(12, "Stalin"));
    employees.add(new Employee(23, "Anbu"));
    employees.add(new Employee(24, "Yuvaraj"));
    employees.add(new Employee(35, "Sena"));
    employees.add(new Employee(36, "Antony"));
    employees.add(new Employee(47, "Sena"));
    employees.add(new Employee(48, "Ravi"));

    List<Employee> empList = new ArrayList<>(employees.stream().collect(
                    Collectors.toMap(Employee::getName, obj -> obj,
                    (existingValue, newValue) -> existingValue))
                   .values());

    empList.forEach(System.out::println);


    //  Collectors.toMap(
    //  Employee::getName, - key (the value by which you want to eliminate duplicate)
    //  obj -> obj,  - value (entire employee object)
    //  (existingValue, newValue) -> existingValue) - to avoid illegalstateexception: duplicate key

Output - toString() overloaded

Employee{id=35, name='Sena'}
Employee{id=12, name='Stalin'}
Employee{id=11, name='Ravi'}
Employee{id=24, name='Yuvaraj'}
Employee{id=36, name='Antony'}
Employee{id=23, name='Anbu'}

Solution 29 - Java

In my case I needed to control what was the previous element. I then created a stateful Predicate where I controled if the previous element was different from the current element, in that case I kept it.

public List<Log> fetchLogById(Long id) {
    return this.findLogById(id).stream()
        .filter(new LogPredicate())
        .collect(Collectors.toList());
}

public class LogPredicate implements Predicate<Log> {

    private Log previous;

    public boolean test(Log atual) {
        boolean isDifferent = previouws == null || verifyIfDifferentLog(current, previous);

        if (isDifferent) {
            previous = current;
        }
        return isDifferent;
    }

    private boolean verifyIfDifferentLog(Log current, Log previous) {
        return !current.getId().equals(previous.getId());
    }

}

Solution 30 - Java

My solution in this listing:

List<HolderEntry> result ....

List<HolderEntry> dto3s = new ArrayList<>(result.stream().collect(toMap(
            HolderEntry::getId,
            holder -> holder,  //or Function.identity() if you want
            (holder1, holder2) -> holder1 
    )).values());

In my situation i want to find distinct values and put their in List.

Solution 31 - Java

As everyone is sharing their own ideas and implementation I also have one, it's not an efficient one, but it's working:

Set<String> personNameList = personList.stream().
map(tempPerson->tempPerson.getName()).collect(Collectors.toSet());

personList.stream().
                   collect(()->new ArrayList<Person>(),
                           (l1,p)->{
			                      if(!personNameList.contains(p.getName())) {
				                        l1.add(p);
			                      }
		}, ArrayList::addAll);

Solution 32 - Java

If you want to List of Persons following would be the simple way

Set<String> set = new HashSet<>(persons.size());
persons.stream().filter(p -> set.add(p.getName())).collect(Collectors.toList());

Additionally, if you want to find distinct or unique list of names, not Person , you can do using following two method as well.

Method 1: using distinct

persons.stream().map(x->x.getName()).distinct.collect(Collectors.toList());

Method 2: using HashSet

Set<E> set = new HashSet<>();
set.addAll(person.stream().map(x->x.getName()).collect(Collectors.toList()));

Solution 33 - Java

What's about this solution.

It will only work if your key implements Equal which most base types do, but it's a little bit simpler.

person.stream().map(person -> p.getName()).distinct()

Solution 34 - Java

The Most simple code you can write:

    persons.stream().map(x-> x.getName()).distinct().collect(Collectors.toList());

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRichKView Question on Stackoverflow
Solution 1 - JavaStuart MarksView Answer on Stackoverflow
Solution 2 - Javawha'eve'View Answer on Stackoverflow
Solution 3 - JavanosidView Answer on Stackoverflow
Solution 4 - JavaSanthoshView Answer on Stackoverflow
Solution 5 - JavajosketresView Answer on Stackoverflow
Solution 6 - JavafrhackView Answer on Stackoverflow
Solution 7 - JavaSaeed ZarinfamView Answer on Stackoverflow
Solution 8 - JavaCraig P. MotlinView Answer on Stackoverflow
Solution 9 - JavaAlexView Answer on Stackoverflow
Solution 10 - JavaMateusz RasińskiView Answer on Stackoverflow
Solution 11 - JavaEnginerView Answer on Stackoverflow
Solution 12 - JavaWojciech GórskiView Answer on Stackoverflow
Solution 13 - JavaGuillaume CornetView Answer on Stackoverflow
Solution 14 - JavaNaveen DhalariaView Answer on Stackoverflow
Solution 15 - JavaTomasz LinkowskiView Answer on Stackoverflow
Solution 16 - JavaAndrew NovitskyiView Answer on Stackoverflow
Solution 17 - Javauneq95View Answer on Stackoverflow
Solution 18 - JavaAbhinav GangulyView Answer on Stackoverflow
Solution 19 - Javasaran3hView Answer on Stackoverflow
Solution 20 - JavaHolgerView Answer on Stackoverflow
Solution 21 - JavazerreView Answer on Stackoverflow
Solution 22 - JavaSourav SharmaView Answer on Stackoverflow
Solution 23 - JavaRafael WinterhalterView Answer on Stackoverflow
Solution 24 - JavaGarrett SmithView Answer on Stackoverflow
Solution 25 - JavaAliaksei YatsauView Answer on Stackoverflow
Solution 26 - JavaAkanksha goreView Answer on Stackoverflow
Solution 27 - JavaKacheView Answer on Stackoverflow
Solution 28 - JavaRavikumarView Answer on Stackoverflow
Solution 29 - JavaFlavio OlivaView Answer on Stackoverflow
Solution 30 - JavaЕвгений ТрахимовичView Answer on Stackoverflow
Solution 31 - Javaanshul dubeyView Answer on Stackoverflow
Solution 32 - JavaAbdur RahmanView Answer on Stackoverflow
Solution 33 - JavaFranziView Answer on Stackoverflow
Solution 34 - Java2Big2BeSmallView Answer on Stackoverflow