Distinct by property of class with LINQ

C#LinqDistinct

C# Problem Overview


I have a collection:

List<Car> cars = new List<Car>();

Cars are uniquely identified by their property CarCode.

I have three cars in the collection, and two with identical CarCodes.

How can I use LINQ to convert this collection to Cars with unique CarCodes?

C# Solutions


Solution 1 - C#

You can use grouping, and get the first car from each group:

List<Car> distinct =
  cars
  .GroupBy(car => car.CarCode)
  .Select(g => g.First())
  .ToList();

Solution 2 - C#

Use MoreLINQ, which has a DistinctBy method :)

IEnumerable<Car> distinctCars = cars.DistinctBy(car => car.CarCode);

(This is only for LINQ to Objects, mind you.)

Solution 3 - C#

Same approach as Guffa but as an extension method:

public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
{
    return items.GroupBy(property).Select(x => x.First());
}

Used as:

var uniqueCars = cars.DistinctBy(x => x.CarCode);

Solution 4 - C#

You can implement an IEqualityComparer and use that in your Distinct extension.

class CarEqualityComparer : IEqualityComparer<Car>
{
    #region IEqualityComparer<Car> Members

    public bool Equals(Car x, Car y)
    {
        return x.CarCode.Equals(y.CarCode);
    }

    public int GetHashCode(Car obj)
    {
        return obj.CarCode.GetHashCode();
    }

    #endregion
}

And then

var uniqueCars = cars.Distinct(new CarEqualityComparer());

Solution 5 - C#

Another extension method for Linq-to-Objects, without using GroupBy:

    /// <summary>
    /// Returns the set of items, made distinct by the selected value.
    /// </summary>
    /// <typeparam name="TSource">The type of the source.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="source">The source collection.</param>
    /// <param name="selector">A function that selects a value to determine unique results.</param>
    /// <returns>IEnumerable&lt;TSource&gt;.</returns>
    public static IEnumerable<TSource> Distinct<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
    {
        HashSet<TResult> set = new HashSet<TResult>();

        foreach(var item in source)
        {
            var selectedValue = selector(item);

            if (set.Add(selectedValue))
                yield return item;
        }
    }

Solution 6 - C#

I think the best option in Terms of performance (or in any terms) is to Distinct using the The IEqualityComparer interface.

Although implementing each time a new comparer for each class is cumbersome and produces boilerplate code.

So here is an extension method which produces a new IEqualityComparer on the fly for any class using reflection.

Usage:

var filtered = taskList.DistinctBy(t => t.TaskExternalId).ToArray();

Extension Method Code

public static class LinqExtensions
{
    public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
    {
        GeneralPropertyComparer<T, TKey> comparer = new GeneralPropertyComparer<T,TKey>(property);
        return items.Distinct(comparer);
    }   
}
public class GeneralPropertyComparer<T,TKey> : IEqualityComparer<T>
{
    private Func<T, TKey> expr { get; set; }
    public GeneralPropertyComparer (Func<T, TKey> expr)
    {
        this.expr = expr;
    }
    public bool Equals(T left, T right)
    {
        var leftProp = expr.Invoke(left);
        var rightProp = expr.Invoke(right);
        if (leftProp == null && rightProp == null)
            return true;
        else if (leftProp == null ^ rightProp == null)
            return false;
        else
            return leftProp.Equals(rightProp);
    }
    public int GetHashCode(T obj)
    {
        var prop = expr.Invoke(obj);
        return (prop==null)? 0:prop.GetHashCode();
    }
}

Solution 7 - C#

You can't effectively use Distinct on a collection of objects (without additional work). I will explain why.

The documentation says:

> It uses the default equality comparer, Default, to compare values.

For objects that means it uses the default equation method to compare objects (source). That is on their hash code. And since your objects don't implement the GetHashCode() and Equals methods, it will check on the reference of the object, which are not distinct.

Solution 8 - C#

Another way to accomplish the same thing...

List<Car> distinticBy = cars
    .Select(car => car.CarCode)
    .Distinct()
    .Select(code => cars.First(car => car.CarCode == code))
    .ToList();

It's possible to create an extension method to do this in a more generic way. It would be interesting if someone could evalute performance of this 'DistinctBy' against the GroupBy approach.

Solution 9 - C#

You can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;

This is how you use it:

using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser278618View Question on Stackoverflow
Solution 1 - C#GuffaView Answer on Stackoverflow
Solution 2 - C#Jon SkeetView Answer on Stackoverflow
Solution 3 - C#Sheldor the conquerorView Answer on Stackoverflow
Solution 4 - C#Anthony PegramView Answer on Stackoverflow
Solution 5 - C#Luke PuplettView Answer on Stackoverflow
Solution 6 - C#Anestis KivranoglouView Answer on Stackoverflow
Solution 7 - C#Patrick HofmanView Answer on Stackoverflow
Solution 8 - C#JwJosefyView Answer on Stackoverflow
Solution 9 - C#Andrzej GisView Answer on Stackoverflow