What is it that makes Enum.HasFlag so slow?

C#.Net

C# Problem Overview


I was doing some speed tests and I noticed that Enum.HasFlag is about 16 times slower than using the bitwise operation.

Does anyone know the internals of Enum.HasFlag and why it is so slow? I mean twice as slow wouldn't be too bad but it makes the function unusable when its 16 times slower.

In case anyone is wondering, here is the code I am using to test its speed.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace app
{
	public class Program
	{
		[Flags]
		public enum Test
		{
			Flag1 = 1,
			Flag2 = 2,
			Flag3 = 4,
			Flag4 = 8
		}
		static int num = 0;
		static Random rand;
		static void Main(string[] args)
		{
			int seed = (int)DateTime.UtcNow.Ticks;

			var st1 = new SpeedTest(delegate
			{
				Test t = Test.Flag1;
				t |= (Test)rand.Next(1, 9);
				if (t.HasFlag(Test.Flag4))
					num++;
			});

			var st2 = new SpeedTest(delegate
			{
				Test t = Test.Flag1;
				t |= (Test)rand.Next(1, 9);
				if (HasFlag(t , Test.Flag4))
					num++;
			});

			rand = new Random(seed);
			st1.Test();
			rand = new Random(seed);
			st2.Test();

			Console.WriteLine("Random to prevent optimizing out things {0}", num);
			Console.WriteLine("HasFlag: {0}ms {1}ms {2}ms", st1.Min, st1.Average, st1.Max);
			Console.WriteLine("Bitwise: {0}ms {1}ms {2}ms", st2.Min, st2.Average, st2.Max);
			Console.ReadLine();
		}
		static bool HasFlag(Test flags, Test flag)
		{
			return (flags & flag) != 0;
		}
	}
	[DebuggerDisplay("Average = {Average}")]
	class SpeedTest
	{
		public int Iterations { get; set; }

		public int Times { get; set; }

		public List<Stopwatch> Watches { get; set; }

		public Action Function { get; set; }

		public long Min { get { return Watches.Min(s => s.ElapsedMilliseconds); } }

		public long Max { get { return Watches.Max(s => s.ElapsedMilliseconds); } }

		public double Average { get { return Watches.Average(s => s.ElapsedMilliseconds); } }

		public SpeedTest(Action func)
		{
			Times = 10;
			Iterations = 100000;
			Function = func;
			Watches = new List<Stopwatch>();
		}

		public void Test()
		{
			Watches.Clear();
			for (int i = 0; i < Times; i++)
			{
				var sw = Stopwatch.StartNew();
				for (int o = 0; o < Iterations; o++)
				{
					Function();
				}
				sw.Stop();
				Watches.Add(sw);
			}
		}
	}
}

Results:

HasFlag: 52ms 53.6ms 55ms
Bitwise: 3ms 3ms 3ms

C# Solutions


Solution 1 - C#

> Does anyone know the internals of Enum.HasFlag and why it is so slow?

The actual check is just a simple bit check in Enum.HasFlag - it's not the problem here. That being said, it is slower than your own bit check...

There are a couple of reasons for this slowdown:

First, Enum.HasFlag does an explicit check to make sure that the type of the enum and the type of the flag are both the same type, and from the same Enum. There is some cost in this check.

Secondly, there is an unfortunate box and unbox of the value during a conversion to UInt64 that occurs inside of HasFlag. This is, I believe, due to the requirement that Enum.HasFlag work with all enums, regardless of the underlying storage type.

That being said, there is a huge advantage to Enum.HasFlag - it's reliable, clean, and makes the code very obvious and expressive. For the most part, I feel that this makes it worth the cost - but if you're using this in a very performance critical loop, it may be worth doing your own check.

Solution 2 - C#

Decompiled code of Enum.HasFlags() looks like this:

public bool HasFlag(Enum flag)
{
    if (!base.GetType().IsEquivalentTo(flag.GetType()))
    {
        throw new ArgumentException(Environment.GetResourceString("Argument_EnumTypeDoesNotMatch", new object[] { flag.GetType(), base.GetType() }));
    }
    ulong num = ToUInt64(flag.GetValue());
    return ((ToUInt64(this.GetValue()) & num) == num);
}

If I were to guess, I would say that checking the type was what's slowing it down most.

Note that in recent versions of .Net Core, this has been improved and Enum.HasFlag compiles to the same code as using bitwise comparisons.

Solution 3 - C#

The performance penalty due to boxing discussed on this page also affects the public .NET functions Enum.GetValues and Enum.GetNames, which both forward to (Runtime)Type.GetEnumValues and (Runtime)Type.GetEnumNames respectively.

All of these functions use a (non-generic) Array as a return type--which is not so bad for the names (since String is a reference type)--but is quite inappropriate for the ulong[] values.

Here's a peek at the offending code (.NET 4.7):

public override Array /* RuntimeType.*/ GetEnumValues()
{
	if (!this.IsEnum)
		throw new ArgumentException();

	ulong[] values = Enum.InternalGetValues(this);
	Array array = Array.UnsafeCreateInstance(this, values.Length);
	for (int i = 0; i < values.Length; i++)
	{
		var obj = Enum.ToObject(this, values[i]);   // ew. boxing.
		array.SetValue(obj, i);                     // yuck
	}
	return array;              // Array of object references, bleh.
}

We can see that prior to doing the copy, RuntimeType goes back again to System.Enum to get an internal array, a singleton which is cached, on demand, for each specific Enum. Notice also that this version of the values array does use the proper strong signature, ulong[].

Here's the .NET function (again we're back in System.Enum now). There's a similar function for getting the names (not shown).

internal static ulong[] InternalGetValues(RuntimeType enumType) => 
    GetCachedValuesAndNames(enumType, false).Values;

See the return type? This looks like a function we'd like to use... But first consider that a second reason that .NET re-copys the array each time (as you saw above) is that .NET must ensure that each caller gets an unaltered copy of the original data, given that a malevolent coder could change her copy of the returned Array, introducing a persistent corruption. Thus, the re-copying precaution is especially intended to protect the cached internal master copy.

If you aren't worried about that risk, perhaps because you feel confident you won't accidentally change the array, or maybe just to eke-out a few cycles of (what's surely premature) optimization, it's simple to fetch the internal cached array copy of the names or values for any Enum:

        → The following two functions comprise the sum contribution of this article ←
        → (but see edit below for improved version) ←

static ulong[] GetEnumValues<T>() where T : struct =>
		(ulong[])typeof(System.Enum)
			.GetMethod("InternalGetValues", BindingFlags.Static | BindingFlags.NonPublic)
			.Invoke(null, new[] { typeof(T) });

static String[] GetEnumNames<T>() where T : struct =>
		(String[])typeof(System.Enum)
			.GetMethod("InternalGetNames", BindingFlags.Static | BindingFlags.NonPublic)
			.Invoke(null, new[] { typeof(T) });

Note that the generic constraint on T isn't fully sufficient for guaranteeing Enum. For simplicity, I left off checking any further beyond struct, but you might want to improve on that. Also for simplicity, this (ref-fetches and) reflects directly off the MethodInfo every time rather than trying to build and cache a Delegate. The reason for this is that creating the proper delegate with a first argument of non-public type RuntimeType is tedious. A bit more on this below.

First, I'll wrap up with usage examples:

var values = GetEnumValues<DayOfWeek>();
var names = GetEnumNames<DayOfWeek>();

and debugger results:

'values'	ulong[7]
[0]	0
[1]	1
[2]	2
[3]	3
[4]	4
[5]	5
[6]	6

'names'	string[7]
[0]	"Sunday"
[1]	"Monday"
[2]	"Tuesday"
[3]	"Wednesday"
[4]	"Thursday"
[5]	"Friday"
[6]	"Saturday"

So I mentioned that the "first argument" of Func<RuntimeType,ulong[]> is annoying to reflect over. However, because this "problem" arg happens to be first, there's a cute workaround where you can bind each specific Enum type as a Target of its own delegate, where each is then reduced to Func<ulong[]>.)

Clearly, its pointless to make any of those delegates, since each would just be a function that always return the same value... but the same logic seems to apply, perhaps less obviously, to the original situation as well (i.e., Func<RuntimeType,ulong[]>). Although we do get by with a just one delegate here, you'd never really want to call it more than once per Enum type. Anyway, all of this leads to a much better solution, which is included in the edit below.


[edit:]
Here's a slightly more elegant version of the same thing. If you will be calling the functions repeatedly for the same Enum type, the version shown here will only use reflection one time per Enum type. It saves the results in a locally-accessible cache for extremely rapid access subsequently.

static class enum_info_cache<T> where T : struct
{
	static _enum_info_cache()
	{
		values = (ulong[])typeof(System.Enum)
			.GetMethod("InternalGetValues", BindingFlags.Static | BindingFlags.NonPublic)
			.Invoke(null, new[] { typeof(T) });

		names = (String[])typeof(System.Enum)
			.GetMethod("InternalGetNames", BindingFlags.Static | BindingFlags.NonPublic)
			.Invoke(null, new[] { typeof(T) });
	}
	public static readonly ulong[] values;
	public static readonly String[] names;
};

The two functions become trivial:

static ulong[] GetEnumValues<T>() where T : struct => enum_info_cache<T>.values;
static String[] GetEnumNames<T>() where T : struct => enum_info_cache<T>.names;

The code shown here illustrates a pattern of combining three specific tricks that seem to mutually result in an unusualy elegant lazy caching scheme. I've found the particular technique to have surprisingly wide application.

  1. using a generic static class to cache independent copies of the arrays for each distinct Enum. Notably, this happens automatically and on demand;

  2. related to this, the loader lock guarantees unique atomic initialization and does this without the clutter of conditional checking constructs. We can also protect static fields with readonly (which, for obvious reasons, typically can't be used with other lazy/deferred/demand methods);

  3. finally, we can capitalize on C# type inference to automatically map the generic function (entry point) into its respective generic static class, so that the demand caching is ultimately even driven implicitly (viz., the best code is the code that isn't there--since it can never have bugs)

You probably noticed that the particular example shown here doesn't really illustrate point (3) very well. Rather than relying on type inference, the void-taking function has to manually propagate forward the type argument T. I didn't choose to expose these simple functions such that there would be an opportunity to show how C# type inference makes the overall technique shine...

However, you can imagine that when you do combine a static generic function that can infer its type argument(s)--i.e., so you don't even have to provide them at the call site--then it gets quite powerful.

The key insight is that, while generic functions have the full type-inference capability, generic classes do not, that is, the compiler will never infer T if you try to call the first of the following lines. But we can still get fully inferred access to a generic class, and all the benefits that entails, by traversing into them via generic function implicit typing (last line):

int t = 4;
typed_cache<int>.MyTypedCachedFunc(t);  // no inference from 't', explicit type required

MyTypedCacheFunc<int>(t);               // ok, (but redundant)

MyTypedCacheFunc(t);                    // ok, full inference

Designed well, inferred typing can effortlessly launch you into the appropriate automatically demand-cached data and behaviors, customized for each type (recall points 1. and 2). As noted, I find the approach useful, especially considering its simplicity.

Solution 4 - C#

The JITter ought to be inlining this as a simple bitwise operation. The JITter is aware enough to custom-handle even certain framework methods (via MethodImplOptions.InternalCall I think?) but HasFlag seems to have escaped Microsoft's serious attention.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionWillView Question on Stackoverflow
Solution 1 - C#Reed CopseyView Answer on Stackoverflow
Solution 2 - C#svickView Answer on Stackoverflow
Solution 3 - C#Glenn SlaydenView Answer on Stackoverflow
Solution 4 - C#Joshua A. SchaefferView Answer on Stackoverflow