Concurrent HashSet<T> in .NET Framework?

C#MultithreadingThread SafetyLockingMutex

C# Problem Overview


I have the following class.

class Test{
    public HashSet<string> Data = new HashSet<string>();
}

I need to change the field "Data" from different threads, so I would like some opinions on my current thread-safe implementation.

class Test{
    public HashSet<string> Data = new HashSet<string>();

    public void Add(string Val){
            lock(Data) Data.Add(Val);
    }

    public void Remove(string Val){
            lock(Data) Data.Remove(Val);
    }
}

Is there a better solution, to go directly to field and protect it from concurrent access by multiple threads?

C# Solutions


Solution 1 - C#

Your implementation is correct. The .NET Framework does not provide a built-in concurrent hashset type, unfortunately. However, there are some workarounds.

ConcurrentDictionary (recommended)

This first one is to use the class ConcurrentDictionary<TKey, TValue> in the namespace System.Collections.Concurrent. In the case, the value is pointless, so we can use a simple byte (1 byte in memory).

private ConcurrentDictionary<string, byte> _data;

This is the recommended option because the type is thread-safe and provide you the same advantages than a HashSet<T> except key and value are different objects.

Source: Social MSDN

ConcurrentBag

If you don't mind about the duplicate entries, you can use the class ConcurrentBag<T> in the same namespace of the previous class.

private ConcurrentBag<string> _data;

Self-implementation

Finally, as you did, you can implement your own data type, using lock or other ways that the .NET provides you to be thread-safe. Here is a great example: https://stackoverflow.com/questions/4306936/how-to-implement-concurrenthashset-in-net

The only drawback of this solution is that the type HashSet<T> doesn't officially concurrent access, even for reading operations.

I quote the code of the linked post (originally written by Ben Mosher).

using System;
using System.Collections.Generic;
using System.Threading;

namespace BlahBlah.Utilities
{
    public class ConcurrentHashSet<T> : IDisposable
    {
        private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
        private readonly HashSet<T> _hashSet = new HashSet<T>();

        #region Implementation of ICollection<T> ...ish
        public bool Add(T item)
        {
            _lock.EnterWriteLock();
            try
            {
                return _hashSet.Add(item);
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

        public void Clear()
        {
            _lock.EnterWriteLock();
            try
            {
                _hashSet.Clear();
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

        public bool Contains(T item)
        {
            _lock.EnterReadLock();
            try
            {
                return _hashSet.Contains(item);
            }
            finally
            {
                if (_lock.IsReadLockHeld) _lock.ExitReadLock();
            }
        }

        public bool Remove(T item)
        {
            _lock.EnterWriteLock();
            try
            {
                return _hashSet.Remove(item);
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

        public int Count
        {
            get
            {
                _lock.EnterReadLock();
                try
                {
                    return _hashSet.Count;
                }
                finally
                {
                    if (_lock.IsReadLockHeld) _lock.ExitReadLock();
                }
            }
        }
        #endregion

        #region Dispose
        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }
        protected virtual void Dispose(bool disposing)
        {
            if (disposing)
                if (_lock != null)
                    _lock.Dispose();
        }
        ~ConcurrentHashSet()
        {
            Dispose(false);
        }
        #endregion
    }
}

EDIT: Move the entrance lock methods ouside the try blocks, as they could throw an exception and execute the instructions contained in the finally blocks.

Solution 2 - C#

Instead of wrapping a ConcurrentDictionary or locking over a HashSet I created an actual ConcurrentHashSet based on ConcurrentDictionary.

This implementation supports basic operations per item without HashSet's set operations as they make less sense in concurrent scenarios IMO:

var concurrentHashSet = new ConcurrentHashSet<string>(
    new[]
    {
        "hamster",
        "HAMster",
        "bar",
    },
    StringComparer.OrdinalIgnoreCase);

concurrentHashSet.TryRemove("foo");

if (concurrentHashSet.Contains("BAR"))
{
    Console.WriteLine(concurrentHashSet.Count);
}

Output: 2

You can get it from NuGet here and see the source on GitHub here.

Solution 3 - C#

Since noone else mentioned it, I will offer an alternative approach that may or may not be appropriate for your particular purpose:

Microsoft Immutable Collections

From a blog post by the MS team behind:

> While creating and running concurrently is easier than ever, one of the fundamental problems still exists: mutable shared state. Reading from multiple threads is typically very easy, but once the state needs to be updated, it gets a lot harder, especially in designs that require locking. > >An alternative to locking is making use of immutable state. Immutable data structures are guaranteed to never change and can thus be passed freely between different threads without worrying about stepping on somebody else’s toes. > >This design creates a new problem though: How do you manage changes in state without copying the entire state each time? This is especially tricky when collections are involved. > >This is where immutable collections come in.

These collections include ImmutableHashSet<T> and ImmutableList<T>.

Performance

Since the immutable collections use tree data structures underneath to enable structural sharing, their performance characteristics are different from mutable collections. When comparing to a locking mutable collection, the results will depend on lock contention and access patterns. However, taken from another blog post about the immutable collections:

> Q: I’ve heard that immutable collections are slow. Are these any different? Can I use them when performance or memory is important? > > A: These immutable collections have been highly tuned to have competitive performance characteristics to the mutable collections while balancing memory sharing. In some cases they are very nearly as fast as the mutable collections both algorithmically and in actual time, sometimes even faster, while in other cases they are algorithmically more complex. In many cases however the difference will be negligible. Generally you should use the simplest code to get the job done and then tune for performance as necessary. The immutable collections help you write simple code, especially when thread-safety has to be considered.

In other words, in many cases the difference won't be noticeable and you should go with the simpler choice - which for concurrent sets would be to use ImmutableHashSet<T>, since you don't have an existing locking mutable implementation! :-)

Solution 4 - C#

The tricky part about making an ISet<T> concurrent is that the set methods (union, intersection, difference) are iterative in nature. At the very least you have to iterate over all n members of one of the sets involved in the operation, while locking both sets.

You lose the advantages of a ConcurrentDictionary<T,byte> when you have to lock the entire set during iteration. Without locking, these operations are not thread safe.

Given the added overhead of ConcurrentDictionary<T,byte>, it's probably wiser just to use the lighter weight HashSet<T> and just surround everything in locks.

If you don't need the set operations, use ConcurrentDictionary<T,byte> and just use default(byte) as the value when you are adding keys.

Solution 5 - C#

I prefer complete solutions so i did this: Mind you my Count is implemented in a different fashion because i don't see why one should be forbidden to read the hashset while attempting to count its values.

@Zen, Thanks for getting it started.

[DebuggerDisplay("Count = {Count}")]
[Serializable]
public class ConcurrentHashSet<T> : ICollection<T>, ISet<T>, ISerializable, IDeserializationCallback
{
	private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);

	private readonly HashSet<T> _hashSet = new HashSet<T>();

	public ConcurrentHashSet()
	{
	}

	public ConcurrentHashSet(IEqualityComparer<T> comparer)
	{
		_hashSet = new HashSet<T>(comparer);
	}

	public ConcurrentHashSet(IEnumerable<T> collection)
	{
		_hashSet = new HashSet<T>(collection);
	}

	public ConcurrentHashSet(IEnumerable<T> collection, IEqualityComparer<T> comparer)
	{
		_hashSet = new HashSet<T>(collection, comparer);
	}

	protected ConcurrentHashSet(SerializationInfo info, StreamingContext context)
	{
		_hashSet = new HashSet<T>();

		// not sure about this one really...
		var iSerializable = _hashSet as ISerializable;
		iSerializable.GetObjectData(info, context);
	}

	#region Dispose

	public void Dispose()
	{
		Dispose(true);
		GC.SuppressFinalize(this);
	}

	protected virtual void Dispose(bool disposing)
	{
		if (disposing)
			if (_lock != null)
				_lock.Dispose();
	}

	public IEnumerator<T> GetEnumerator()
	{
		return _hashSet.GetEnumerator();
	}

	~ConcurrentHashSet()
	{
		Dispose(false);
	}

	public void OnDeserialization(object sender)
	{
		_hashSet.OnDeserialization(sender);
	}

	public void GetObjectData(SerializationInfo info, StreamingContext context)
	{
		_hashSet.GetObjectData(info, context);
	}

	IEnumerator IEnumerable.GetEnumerator()
	{
		return GetEnumerator();
	}

	#endregion

	public void Add(T item)
	{
		_lock.EnterWriteLock();
		try
		{
			_hashSet.Add(item);
		}
		finally
		{
			if(_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public void UnionWith(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		_lock.EnterReadLock();
		try
		{
			_hashSet.UnionWith(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
			if (_lock.IsReadLockHeld) _lock.ExitReadLock();
		}
	}

	public void IntersectWith(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		_lock.EnterReadLock();
		try
		{
			_hashSet.IntersectWith(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
			if (_lock.IsReadLockHeld) _lock.ExitReadLock();
		}
	}

	public void ExceptWith(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		_lock.EnterReadLock();
		try
		{
			_hashSet.ExceptWith(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
			if (_lock.IsReadLockHeld) _lock.ExitReadLock();
		}
	}

	public void SymmetricExceptWith(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		try
		{
			_hashSet.SymmetricExceptWith(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public bool IsSubsetOf(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		try
		{
			return _hashSet.IsSubsetOf(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public bool IsSupersetOf(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		try
		{
			return _hashSet.IsSupersetOf(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public bool IsProperSupersetOf(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		try
		{
			return _hashSet.IsProperSupersetOf(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public bool IsProperSubsetOf(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		try
		{
			return _hashSet.IsProperSubsetOf(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public bool Overlaps(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		try
		{
			return _hashSet.Overlaps(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public bool SetEquals(IEnumerable<T> other)
	{
		_lock.EnterWriteLock();
		try
		{
			return _hashSet.SetEquals(other);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	bool ISet<T>.Add(T item)
	{
		_lock.EnterWriteLock();
		try
		{
			return _hashSet.Add(item);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public void Clear()
	{
		_lock.EnterWriteLock();
		try
		{
			_hashSet.Clear();
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public bool Contains(T item)
	{
		_lock.EnterWriteLock();
		try
		{
			return _hashSet.Contains(item);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public void CopyTo(T[] array, int arrayIndex)
	{
		_lock.EnterWriteLock();
		try
		{
			_hashSet.CopyTo(array, arrayIndex);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public bool Remove(T item)
	{
		_lock.EnterWriteLock();
		try
		{
			return _hashSet.Remove(item);
		}
		finally
		{
			if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
		}
	}

	public int Count
	{
		get
		{
			_lock.EnterWriteLock();
			try
			{
				return _hashSet.Count;
			}
			finally
			{
				if(_lock.IsWriteLockHeld) _lock.ExitWriteLock();
			}
			
		}
	}

	public bool IsReadOnly
	{
		get { return false; }
	}
}

Solution 6 - C#

I found that neither simply locking add and remove methods of a System.Collections.Generic.HashSet, nor wrapping the framework's ConcurrentDictionary is sufficient in "high-throughput" scenarios which require good performance.

Fairly good performance can already be achieved by using this simple idea:

public class ExampleHashSet<T>
{
    const int ConcurrencyLevel = 124;
    const int Lower31BitMask = 0x7FFFFFFF;
    
    HashSet<T>[] sets = new HashSet<T>[ConcurrencyLevel];
    IEqualityComparer<T> comparer;

    public ExampleHashSet()
    {
        comparer = EqualityComparer<T>.Default;

        for(int i = 0; i < ConcurrencyLevel; i++)
            sets[i] = new HashSet<T>();
    }

    public bool Add(T item)
    {
        int hash = (comparer.GetHashCode(item) & Lower31BitMask) % ConcurrencyLevel;
        
        lock(sets[hash]) 
        {
            return sets[hash].Add(item);
        }
    }

    public bool Remove(T item)
    {
        int hash = (comparer.GetHashCode(item) & Lower31BitMask) % ConcurrencyLevel;
        
        lock(sets[hash]) 
        {
            return sets[hash].Remove(item);
        }
    }

    // further methods ...
}

The system's HashSet is wrapped, but in contrast to other anwers, we hold locks on multiple HashSets. Different threads are able to "work" on different HashSets, lowering the overall wait-time.

This idea can be generalized and implemented directly in the HashSet itself (holding locks on the buckets, instead of locking complete sets). An example can be found here.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkukabView Question on Stackoverflow
Solution 1 - C#JämesView Answer on Stackoverflow
Solution 2 - C#i3arnonView Answer on Stackoverflow
Solution 3 - C#Søren BoisenView Answer on Stackoverflow
Solution 4 - C#pugbyView Answer on Stackoverflow
Solution 5 - C#DblView Answer on Stackoverflow
Solution 6 - C#notgivenView Answer on Stackoverflow