Why is dictionary so much faster than list?

C#.NetPerformance

C# Problem Overview


I am testing the speed of getting data from Dictionary VS list.
I've used this code to test :

    internal class Program
{
    private static void Main(string[] args)
    {
        var stopwatch = new Stopwatch();
        List<Grade> grades = Grade.GetData().ToList();
        List<Student> students = Student.GetStudents().ToList();

        stopwatch.Start();
        foreach (Student student in students)
        {
            student.Grade = grades.Single(x => x.StudentId == student.Id).Value;
        }
        stopwatch.Stop();
        Console.WriteLine("Using list {0}", stopwatch.Elapsed);
        stopwatch.Reset();
        students = Student.GetStudents().ToList();
        stopwatch.Start();
        Dictionary<Guid, string> dic = Grade.GetData().ToDictionary(x => x.StudentId, x => x.Value);
        foreach (Student student in students)
        {
            student.Grade = dic[student.Id];
        }
        stopwatch.Stop();
        Console.WriteLine("Using dictionary {0}", stopwatch.Elapsed);
        Console.ReadKey();
    }
}

public class GuidHelper
{
    public static List<Guid> ListOfIds=new List<Guid>();

    static GuidHelper()
    {
        for (int i = 0; i < 10000; i++)
        {
            ListOfIds.Add(Guid.NewGuid());
        }
    }
}


public class Grade
{
    public Guid StudentId { get; set; }
    public string Value { get; set; }

    public static IEnumerable<Grade> GetData()
    {
        for (int i = 0; i < 10000; i++)
        {
            yield return new Grade
                             {
                                 StudentId = GuidHelper.ListOfIds[i], Value = "Value " + i
                             };
        }
    }
}

public class Student
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public string Grade { get; set; }

    public static IEnumerable<Student> GetStudents()
    {
        for (int i = 0; i < 10000; i++)
        {
            yield return new Student
                             {
                                 Id = GuidHelper.ListOfIds[i],
                                 Name = "Name " + i
                             };
        }
    }
}

There is list of students and grades in memory they have StudentId in common.
In first way I tried to find Grade of a student using LINQ on a list that takes near 7 seconds on my machine and in another way first I converted List into dictionary then finding grades of student from dictionary using key that takes less than a second . enter image description here

C# Solutions


Solution 1 - C#

When you do this:

student.Grade = grades.Single(x => x.StudentId == student.Id).Value;

As written it has to enumerate the entire List until it finds the entry in the List that has the correct studentId (does entry 0 match the lambda? No... Does entry 1 match the lambda? No... etc etc). This is O(n). Since you do it once for every student, it is O(n^2).

However when you do this:

student.Grade = dic[student.Id];

If you want to find a certain element by key in a dictionary, it can instantly jump to where it is in the dictionary - this is O(1). O(n) for doing it for every student. (If you want to know how this is done - Dictionary runs a mathematical operation on the key, which turns it into a value that is a place inside the dictionary, which is the same place it put it when it was inserted)

So, dictionary is faster because you used a better algorithm.

Solution 2 - C#

The reason is because a dictionary is a lookup, while a list is an iteration.

Dictionary uses a hash lookup, while your list requires walking through the list until it finds the result from beginning to the result each time.

to put it another way. The list will be faster than the dictionary on the first item, because there's nothing to look up. it's the first item, boom.. it's done. but the second time the list has to look through the first item, then the second item. The third time through it has to look through the first item, then the second item, then the third item.. etc..

So each iteration the lookup takes more and more time. The larger the list, the longer it takes. While the dictionary is always a more or less fixed lookup time (it also increases as the dictionary gets larger, but at a much slower pace, so by comparison it's almost fixed).

Solution 3 - C#

When using Dictionary you are using a key to retrieve your information, which enables it to find it more efficiently, with List you are using Single Linq expression, which since it is a list, has no other option other than to look in entire list for wanted the item.

Solution 4 - C#

Dictionary uses hashing to search for the data. Each item in the dictionary is stored in buckets of items that contain the same hash. It's a lot quicker.

Try sorting your list, it will be a a bit quicker then.

Solution 5 - C#

A dictionary uses a hash table, it is a great data structure as it maps an input to a corresponding output almost instantaneously, it has a complexity of O(1) as already pointed out which means more or less immediate retrieval.

The cons of it is that for the sake of performance you need lots of space in advance (depending on the implementation be it separate chaining or linear/quadratic probing you may need at least as much as you're planning to store, probably double in the latter case) and you need a good hashing algorithm that maps uniquely your input ("John Smith") to a corresponding output such as a position in an array (hash_array[34521]).

Also listing the entries in a sorted order is a problem. If I may quote Wikipedia:

> Listing all n entries in some specific order generally requires a > separate sorting step, whose cost is proportional to log(n) per entry.

Have a read on linear probing and separate chaining for some gorier details :)

Solution 6 - C#

Dictionary is based on a hash table which is a rather efficient algorithm to look up things. In a list you have to go element by element in order to find something.

It's all a matter of data organization...

Solution 7 - C#

When it comes to lookup of data, a keyed collection is always faster than a non-keyed collection. This is because a non-keyed collection will have to enumerate its elements to find what you are looking for. While in a keyed collection you can just access the element directly via the key.

These are some nice articles for comparing list to dictionary.

Here. And this one.

Solution 8 - C#

From MSDN - Dictionary mentions close to O(1) but I think it depends on the types involved.

The Dictionary(TKey,TValue) generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.

Note: The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.

List(TValue) does not implement a hash lookup so it is sequential and the performance is O(n). It also depends on the types involved and boxing/unboxing needs to be considered.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionUnforgivenView Question on Stackoverflow
Solution 1 - C#PatashuView Answer on Stackoverflow
Solution 2 - C#Erik FunkenbuschView Answer on Stackoverflow
Solution 3 - C#Ron.B.IView Answer on Stackoverflow
Solution 4 - C#Quinton BernhardtView Answer on Stackoverflow
Solution 5 - C#NobilisView Answer on Stackoverflow
Solution 6 - C#JeffRSonView Answer on Stackoverflow
Solution 7 - C#aiapatagView Answer on Stackoverflow
Solution 8 - C#howserssView Answer on Stackoverflow