Remove duplicates from a list of objects based on property in Java 8

JavaListJava 8

Java Problem Overview


I am trying to remove duplicates from a List of objects based on some property.

can we do it in a simple way using java 8

List<Employee> employee

Can we remove duplicates from it based on id property of employee. I have seen posts removing duplicate strings form arraylist of string.

Java Solutions


Solution 1 - Java

You can get a stream from the List and put in in the TreeSet from which you provide a custom comparator that compares id uniquely.

Then if you really need a list you can put then back this collection into an ArrayList.

import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.toCollection;

...
List<Employee> unique = employee.stream()
                                .collect(collectingAndThen(toCollection(() -> new TreeSet<>(comparingInt(Employee::getId))),
                                                           ArrayList::new));

Given the example:

List<Employee> employee = Arrays.asList(new Employee(1, "John"), new Employee(1, "Bob"), new Employee(2, "Alice"));

It will output:

[Employee{id=1, name='John'}, Employee{id=2, name='Alice'}]


Another idea could be to use a wrapper that wraps an employee and have the equals and hashcode method based with its id:

class WrapperEmployee {
    private Employee e;

    public WrapperEmployee(Employee e) {
        this.e = e;
    }

    public Employee unwrap() {
        return this.e;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        WrapperEmployee that = (WrapperEmployee) o;
        return Objects.equals(e.getId(), that.e.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(e.getId());
    }
}

Then you wrap each instance, call distinct(), unwrap them and collect the result in a list.

List<Employee> unique = employee.stream()
                                .map(WrapperEmployee::new)
                                .distinct()
                                .map(WrapperEmployee::unwrap)
                                .collect(Collectors.toList());


In fact, I think you can make this wrapper generic by providing a function that will do the comparison:

public class Wrapper<T, U> {
    private T t;
    private Function<T, U> equalityFunction;

    public Wrapper(T t, Function<T, U> equalityFunction) {
        this.t = t;
        this.equalityFunction = equalityFunction;
    }

    public T unwrap() {
        return this.t;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        @SuppressWarnings("unchecked")
        Wrapper<T, U> that = (Wrapper<T, U>) o;
        return Objects.equals(equalityFunction.apply(this.t), that.equalityFunction.apply(that.t));
    }

    @Override
    public int hashCode() {
        return Objects.hash(equalityFunction.apply(this.t));
    }
}

and the mapping will be:

.map(e -> new Wrapper<>(e, Employee::getId))

Solution 2 - Java

The easiest way to do it directly in the list is

HashSet<Object> seen=new HashSet<>();
employee.removeIf(e->!seen.add(e.getID()));
  • removeIf will remove an element if it meets the specified criteria
  • Set.add will return false if it did not modify the Set, i.e. already contains the value
  • combining these two, it will remove all elements (employees) whose id has been encountered before

Of course, it only works if the list supports removal of elements.

Solution 3 - Java

If you can make use of equals, then filter the list by using distinct within a stream (see answers above). If you can not or don't want to override the equals method, you can filter the stream in the following way for any property, e.g. for the property Name (the same for the property Id etc.):

Set<String> nameSet = new HashSet<>();
List<Employee> employeesDistinctByName = employees.stream()
            .filter(e -> nameSet.add(e.getName()))
            .collect(Collectors.toList());

Solution 4 - Java

Another solution is to use a Predicate, then you can use this in any filter:

public static <T> Predicate<T> distinctBy(Function<? super T, ?> f) {
  Set<Object> objects = new ConcurrentHashSet<>();
  return t -> objects.add(f.apply(t));
}

Then simply reuse the predicate anywhere:

employees.stream().filter(distinctBy(e -> e.getId));

Note: in the JavaDoc of filter, which says it takes a stateless Predicte. Actually, this works fine even if the stream is parallel.


About other solutions:

  1. Using .collect(Collectors.toConcurrentMap(..)).values() is a good solution, but it's annoying if you want to sort and keep the order.

  2. stream.removeIf(e->!seen.add(e.getID())); is also another very good solution. But we need to make sure the collection implemented removeIf, for example it will throw exception if we construct the collection use Arrays.asList(..).

Solution 5 - Java

Try this code:

Collection<Employee> nonDuplicatedEmployees = employees.stream()
   .<Map<Integer, Employee>> collect(HashMap::new,(m,e)->m.put(e.getId(), e), Map::putAll)
   .values();

Solution 6 - Java

This worked for me:

list.stream().distinct().collect(Collectors.toList());

You need to implement equals, of course

Solution 7 - Java

If order does not matter and when it's more performant to run in parallel, Collect to a Map and then get values:

employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values()

Solution 8 - Java

There are a lot of good answers here but I didn't find the one about using reduce method. So for your case, you can apply it in following way:

 List<Employee> employeeList = employees.stream()
      .reduce(new ArrayList<>(), (List<Employee> accumulator, Employee employee) ->
      {
        if (accumulator.stream().noneMatch(emp -> emp.getId().equals(employee.getId())))
        {
          accumulator.add(employee);
        }
        return accumulator;
      }, (acc1, acc2) ->
      {
        acc1.addAll(acc2);
        return acc1;
      });

Solution 9 - Java

Another version which is simple

BiFunction<TreeSet<Employee>,List<Employee> ,TreeSet<Employee>> appendTree = (y,x) -> (y.addAll(x))? y:y;

TreeSet<Employee> outputList = appendTree.apply(new TreeSet<Employee>(Comparator.comparing(p->p.getId())),personList);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPatanView Question on Stackoverflow
Solution 1 - JavaAlexis C.View Answer on Stackoverflow
Solution 2 - JavaHolgerView Answer on Stackoverflow
Solution 3 - JavaRolch2015View Answer on Stackoverflow
Solution 4 - JavanavinsView Answer on Stackoverflow
Solution 5 - JavaThoView Answer on Stackoverflow
Solution 6 - JavaSebastian D'AgostinoView Answer on Stackoverflow
Solution 7 - JavaXiao LiuView Answer on Stackoverflow
Solution 8 - JavaAlexView Answer on Stackoverflow
Solution 9 - JavazawhtutView Answer on Stackoverflow