HashSet<T> versus Dictionary<K, V> w.r.t searching time to find if an item exists

.NetPerformanceDictionaryHashset

.Net Problem Overview


HashSet<T> t = new HashSet<T>();
// add 10 million items


Dictionary<K, V> t = new Dictionary<K, V>();
// add 10 million items.

Whose .Contains method will return quicker?

Just to clarify, my requirement is I have 10 million objects (well, strings really) that I need to check if they exist in the data structure. I will NEVER iterate.

.Net Solutions


Solution 1 - .Net

HashSet vs List vs Dictionary performance test, taken from here.

Add 1000000 objects (without checking duplicates)

Contains check for half the objects of a collection of 10000

Remove half the objects of a collection of 10000

Solution 2 - .Net

I assume you mean Dictionary<TKey, TValue> in the second case? HashTable is a non-generic class.

You should choose the right collection for the job based on your actual requirements. Do you actually want to map each key to a value? If so, use Dictionary<,>. If you only care about it as a set, use HashSet<>.

I would expect HashSet<T>.Contains and Dictionary<TKey, TValue>.ContainsKey (which are the comparable operations, assuming you're using your dictionary sensibly) to basically perform the same - they're using the same algorithm, fundamentally. I guess with the entries in Dictionary<,> being larger you end up with a greater likelihood of blowing the cache with Dictionary<,> than with HashSet<>, but I'd expect that to be insignificant compared with the pain of choosing the wrong data type simply in terms of what you're trying to achieve.

Solution 3 - .Net

From MSDN documentation for Dictionary<TKey,TValue>

> "Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table."

With a note:

> "The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey"

I know your question/post is old - but while looking for an answer to a similar question I stumbled across this.

Hope this helps. Scroll down to the Remarks section for more details. https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.110).aspx

Solution 4 - .Net

These are different data structures. Also there is no generic version of HashTable.

HashSet contains values of type T which HashTable (or Dictionary) contains key-value pairs. So you should choose collection on what data you need to be stored.

Solution 5 - .Net

The accepted answer to this question does NOT validly answer the question! It happens to give the correct answer, but that answer isn't shown by the evidence they provided.

What that answer shows is that Key lookups on a Dictionary or HashSet are vastly quicker than looking up in a List. Which is true, but not interesting, nor surprising, nor proof that they have the same speed.

I've run the code below to compare the lookup times, and my conclusion is that they ARE in fact the same speed. (Or at least, if there is any difference, then the difference is well within the Standard Deviation of that speed)

Specifically, 100,000,000 lookups was taking between 10 and 11.5 seconds for both, for me, in this test.

Test Code:

private const int TestReps = 100_000_000;
[Test]
public void CompareHashSetContainsVersusDictionaryContainsKey()
{
    for (int j = 0; j < 10; j++)
    {
        var rand = new Random();
        var dict = new Dictionary<int, int>();
        var hash = new HashSet<int>();

        for (int i = 0; i < TestReps; i++)
        {
            var key = rand.Next();
            var value = rand.Next();
            hash.Add(key);
            dict.TryAdd(key, value);
        }

        var testPoints = Enumerable.Repeat(1, TestReps).Select(_ => rand.Next()).ToArray();
        var timer = new Stopwatch();
        var total = 0;
        
        timer.Restart();
            for (int i = 0; i < TestReps; i++)
            {
                var newKey = testPoints[i];
                if (hash.Contains(newKey))
                {
                    total++;
                }
            }
        Console.WriteLine(timer.Elapsed);
        
        var target = total;
        Assert.That(total == target);
        

        timer.Restart();
            for (int i = 0; i < TestReps; i++)
            {
                var newKey = testPoints[i];
                if (dict.ContainsKey(newKey))
                {
                    total++;
                }
            }
        Console.WriteLine(timer.Elapsed);

        Assert.That(total == target * 2);
        Console.WriteLine("Set");
    }
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionhalivingstonView Question on Stackoverflow
Solution 1 - .NethadView Answer on Stackoverflow
Solution 2 - .NetJon SkeetView Answer on Stackoverflow
Solution 3 - .NetripvlanView Answer on Stackoverflow
Solution 4 - .NetAndrew BezzubView Answer on Stackoverflow
Solution 5 - .NetBrondahlView Answer on Stackoverflow