Remove objects with a duplicate property from List

C#Arrays

C# Problem Overview


I have a List of objects in C#. All of the objects contain a property ID. There are several objects that have the same ID property.

How can I trim the List (or make a new List) where there is only one object per ID property?

[Any additional duplicates are dropped out of the List]

C# Solutions


Solution 1 - C#

If you want to avoid using a third-party library, you could do something like:

var bar = fooArray.GroupBy(x => x.Id).Select(x => x.First()).ToList();

That will group the array by the Id property, then select the first entry in the grouping.

Solution 2 - C#

MoreLINQ DistinctBy() will do the job, it allows using object proeprty for the distinctness. Unfortunatly built in LINQ Distinct() not flexible enoght.

var uniqueItems = allItems.DistinctBy(i => i.Id);

DistinctBy()

> Returns all distinct elements of the given source, where > "distinctness" is determined via a projection and the default eqaulity > comparer for the projected type.

PS: Credits to Jon Skeet for sharing this library with community

Solution 3 - C#

var list = GetListFromSomeWhere();
var list2 = GetListFromSomeWhere();
list.AddRange(list2);

....
...
var distinctedList = list.DistinctBy(x => x.ID).ToList();

More LINQ at GitHub

Or if you don't want to use external dlls for some reason, You can use this Distinct overload:

public static IEnumerable<TSource> Distinct<TSource>(
    this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)

Usage:

public class FooComparer : IEqualityComparer<Foo>
{
    // Products are equal if their names and product numbers are equal.
    public bool Equals(Foo x, Foo y)
    {

        //Check whether the compared objects reference the same data.
        if (Object.ReferenceEquals(x, y)) return true;

        //Check whether any of the compared objects is null.
        if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
            return false;

        return x.ID == y.ID
    }
}



list.Distinct(new FooComparer());

Solution 4 - C#

Starting from .NET 6, a new DistinctBy LINQ operator is available:

public static IEnumerable<TSource> DistinctBy<TSource,TKey> (
    this IEnumerable<TSource> source,
    Func<TSource,TKey> keySelector);

> Returns distinct elements from a sequence according to a specified key selector function.

Usage example:

List<Item> distinctList = listWithDuplicates
    .DistinctBy(i => i.Id)
    .ToList();

There is also an overload that has an IEqualityComparer<TKey> parameter.


Alternative: In case creating a new List<T> is not desirable, here is a RemoveDuplicates extension method for the List<T> class:

/// <summary>
/// Removes all the elements that are duplicates of previous elements,
/// according to a specified key selector function.
/// </summary>
/// <returns>
/// The number of elements removed.
/// </returns>
public static int RemoveDuplicates<TSource, TKey>(
    this List<TSource> source,
    Func<TSource, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    var hashSet = new HashSet<TKey>(keyComparer);
    return source.RemoveAll(item => !hashSet.Add(keySelector(item)));
}

This method is efficient (O(n)) but a bit dangerous, because it has the potential to corrupt the contents of the List<T> in case the keySelector lambda fails for some item. The same problem exists with the built-in RemoveAll method¹. So in case the keySelector lambda is not fail-proof, the RemoveDuplicates method should be invoked in a try block that has a catch block where the potentially corrupted list is discarded.

¹ The List<T> class is backed by an internal _items array. The RemoveAll method invokes the Predicate<T> match for each item in the list, moving values stored in the _items along the way (source code). In case of an exception the RemoveAll just exits immediately, leaving the _items in a corrupted state. I've posted [an issue][4] on GitHub regarding the corruptive behavior of this method, and the feedback that I've got was that neither the implementation should be fixed, nor the behavior should be documented.

[4]: https://github.com/dotnet/runtime/issues/66255 "Not documented that the List.RemoveAll method can corrupt the list"

Solution 5 - C#

Not sure if anyone is still looking for any additional ways to do this. But I've used this code to remove duplicates from a list of User objects based on matching ID numbers.

private ArrayList RemoveSearchDuplicates(ArrayList SearchResults)
{
    ArrayList TempList = new ArrayList();

    foreach (User u1 in SearchResults)
    {
        bool duplicatefound = false;
        foreach (User u2 in TempList)
            if (u1.ID == u2.ID)
                duplicatefound = true;

        if (!duplicatefound)
            TempList.Add(u1);
    }
    return TempList;
}

Call: SearchResults = RemoveSearchDuplicates(SearchResults);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBaxterView Question on Stackoverflow
Solution 1 - C#Daniel MannView Answer on Stackoverflow
Solution 2 - C#sllView Answer on Stackoverflow
Solution 3 - C#gdoron is supporting MonicaView Answer on Stackoverflow
Solution 4 - C#Theodor ZouliasView Answer on Stackoverflow
Solution 5 - C#JScottView Answer on Stackoverflow