Remove objects with a duplicate property from List
C#ArraysC# Problem Overview
I have a List of objects in C#. All of the objects contain a property ID. There are several objects that have the same ID property.
How can I trim the List (or make a new List) where there is only one object per ID property?
[Any additional duplicates are dropped out of the List]
C# Solutions
Solution 1 - C#
If you want to avoid using a third-party library, you could do something like:
var bar = fooArray.GroupBy(x => x.Id).Select(x => x.First()).ToList();
That will group the array by the Id property, then select the first entry in the grouping.
Solution 2 - C#
MoreLINQ DistinctBy()
will do the job, it allows using object proeprty for the distinctness. Unfortunatly built in LINQ Distinct()
not flexible enoght.
var uniqueItems = allItems.DistinctBy(i => i.Id);
DistinctBy()
> Returns all distinct elements of the given source, where > "distinctness" is determined via a projection and the default eqaulity > comparer for the projected type.
PS: Credits to Jon Skeet for sharing this library with community
Solution 3 - C#
var list = GetListFromSomeWhere();
var list2 = GetListFromSomeWhere();
list.AddRange(list2);
....
...
var distinctedList = list.DistinctBy(x => x.ID).ToList();
More LINQ
at GitHub
Or if you don't want to use external dlls for some reason, You can use this Distinct
overload:
public static IEnumerable<TSource> Distinct<TSource>(
this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
Usage:
public class FooComparer : IEqualityComparer<Foo>
{
// Products are equal if their names and product numbers are equal.
public bool Equals(Foo x, Foo y)
{
//Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
return x.ID == y.ID
}
}
list.Distinct(new FooComparer());
Solution 4 - C#
Starting from .NET 6, a new DistinctBy
LINQ operator is available:
public static IEnumerable<TSource> DistinctBy<TSource,TKey> (
this IEnumerable<TSource> source,
Func<TSource,TKey> keySelector);
> Returns distinct elements from a sequence according to a specified key selector function.
Usage example:
List<Item> distinctList = listWithDuplicates
.DistinctBy(i => i.Id)
.ToList();
There is also an overload that has an IEqualityComparer<TKey>
parameter.
Alternative: In case creating a new List<T>
is not desirable, here is a RemoveDuplicates
extension method for the List<T>
class:
/// <summary>
/// Removes all the elements that are duplicates of previous elements,
/// according to a specified key selector function.
/// </summary>
/// <returns>
/// The number of elements removed.
/// </returns>
public static int RemoveDuplicates<TSource, TKey>(
this List<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> keyComparer = null)
{
var hashSet = new HashSet<TKey>(keyComparer);
return source.RemoveAll(item => !hashSet.Add(keySelector(item)));
}
This method is efficient (O(n)) but a bit dangerous, because it has the potential to corrupt the contents of the List<T>
in case the keySelector
lambda fails for some item. The same problem exists with the built-in RemoveAll
method¹. So in case the keySelector
lambda is not fail-proof, the RemoveDuplicates
method should be invoked in a try
block that has a catch
block where the potentially corrupted list is discarded.
¹ The List<T>
class is backed by an internal _items
array. The RemoveAll
method invokes the Predicate<T> match
for each item in the list, moving values stored in the _items
along the way (source code). In case of an exception the RemoveAll
just exits immediately, leaving the _items
in a corrupted state. I've posted [an issue][4] on GitHub regarding the corruptive behavior of this method, and the feedback that I've got was that neither the implementation should be fixed, nor the behavior should be documented.
[4]: https://github.com/dotnet/runtime/issues/66255 "Not documented that the List
Solution 5 - C#
Not sure if anyone is still looking for any additional ways to do this. But I've used this code to remove duplicates from a list of User objects based on matching ID numbers.
private ArrayList RemoveSearchDuplicates(ArrayList SearchResults)
{
ArrayList TempList = new ArrayList();
foreach (User u1 in SearchResults)
{
bool duplicatefound = false;
foreach (User u2 in TempList)
if (u1.ID == u2.ID)
duplicatefound = true;
if (!duplicatefound)
TempList.Add(u1);
}
return TempList;
}
Call: SearchResults = RemoveSearchDuplicates(SearchResults);