Performance of "direct" virtual call vs. interface call in C#

C# Problem Overview

This benchmark appears to show that calling a virtual method directly on object reference is faster than calling it on the reference to the interface this object implements.

In other words:

interface IFoo {
    void Bar();
}

class Foo : IFoo {
    public virtual void Bar() {}
}

void Benchmark() {
    Foo f = new Foo();
    IFoo f2 = f;
    f.Bar(); // This is faster.
    f2.Bar();    
}

Coming from the C++ world, I would have expected that both of these calls would be implemented identically (as a simple virtual table lookup) and have the same performance. How does C# implement virtual calls and what is this "extra" work that apparently gets done when calling through an interface?

--- EDIT ---

OK, answers/comments I got so far imply that there is a double-pointer-dereference for virtual call through interface versus just one dereference for virtual call through object.

So could please somebody explain why is that necessary? What is the structure of the virtual table in C#? Is it "flat" (as is typical for C++) or not? What were the design tradeoffs that were made in C# language design that lead to this? I'm not saying this is a "bad" design, I'm simply curious as to why it was necessary.

In a nutshell, I'd like to understand what my tool does under the hood so I can use it more effectively. And I would appreciate if I didn't get any more "you shouldn't know that" or "use another language" types of answers.

--- EDIT 2 ---

Just to make it clear we are not dealing with some compiler of JIT optimization here that removes the dynamic dispatch: I modified the benchmark mentioned in the original question to instantiate one class or the other randomly at run-time. Since the instantiation happens after compilation and after assembly loading/JITing, there is no way to avoid dynamic dispatch in both cases:

interface IFoo {
    void Bar();
}

class Foo : IFoo {
    public virtual void Bar() {
    }
}

class Foo2 : Foo {
    public override void Bar() {
    }
}

class Program {

    static Foo GetFoo() {
        if ((new Random()).Next(2) % 2 == 0)
            return new Foo();
        return new Foo2();
    }

    static void Main(string[] args) {

        var f = GetFoo();
        IFoo f2 = f;

        Console.WriteLine(f.GetType());

        // JIT warm-up
        f.Bar();
        f2.Bar();

        int N = 10000000;
        Stopwatch sw = new Stopwatch();

        sw.Start();
        for (int i = 0; i < N; i++) {
            f.Bar();
        }
        sw.Stop();
        Console.WriteLine("Direct call: {0:F2}", sw.Elapsed.TotalMilliseconds);

        sw.Reset();
        sw.Start();
        for (int i = 0; i < N; i++) {
            f2.Bar();
        }
        sw.Stop();
        Console.WriteLine("Through interface: {0:F2}", sw.Elapsed.TotalMilliseconds);

        // Results:
        // Direct call: 24.19
        // Through interface: 40.18

    }

}

--- EDIT 3 ---

If anyone is interested, here is how my Visual C++ 2010 lays out an instance of a class that multiply-inherits other classes:

Code:

class IA {
public:
    virtual void a() = 0;
};

class IB {
public:
    virtual void b() = 0;
};

class C : public IA, public IB {
public:
    virtual void a() override {
        std::cout << "a" << std::endl;
    }
    virtual void b() override {
        std::cout << "b" << std::endl;
    }
};

Debugger:

c	{...}	C
	IA	{...}	IA
 		__vfptr	0x00157754 const C::`vftable'{for `IA'}	*
            [0]	0x00151163 C::a(void)	*
	IB	{...}	IB
		__vfptr	0x00157748 const C::`vftable'{for `IB'}	*
            [0]	0x0015121c C::b(void)	*

Multiple virtual table pointers are clearly visible, and sizeof(C) == 8 (in 32-bit build).

The...

C c;
std::cout << static_cast<IA*>(&c) << std::endl;
std::cout << static_cast<IB*>(&c) << std::endl;

..prints...

0027F778
0027F77C

...indicating that pointers to different interfaces within the same object actually point to different parts of that object (i.e. they contain different physical addresses).

C# Solutions

Solution 1 - C#

I think the article Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects will answer your questions. In particular, see the section *Interface Vtable Map and Interface Map-, and the following section on Virtual Dispatch.

It's probably possible for the JIT compiler to figure things out and optimize the code for your simple case. But not in the general case.

IFoo f2 = GetAFoo();

And GetAFoo is defined as returning an IFoo, then the JIT compiler wouldn't be able to optimize the call.

Solution 2 - C#

Here is what the disassembly looks like (Hans is correct):

            f.Bar(); // This is faster.
00000062  mov         rax,qword ptr [rsp+20h]
00000067  mov         rax,qword ptr [rax]
0000006a  mov         rcx,qword ptr [rsp+20h]
0000006f  call        qword ptr [rax+60h]
            f2.Bar();
00000072  mov         r11,7FF000400A0h
0000007c  mov         qword ptr [rsp+38h],r11
00000081  mov         rax,qword ptr [rsp+28h]
00000086  cmp         byte ptr [rax],0
00000089  mov         rcx,qword ptr [rsp+28h]
0000008e  mov         r11,qword ptr [rsp+38h]
00000093  mov         rax,qword ptr [rsp+38h]
00000098  call        qword ptr [rax]

Solution 3 - C#

I tried your test and on my machine, in a particular context, the result is actually the other way around.

I am running Windows 7 x64 and I have created a Visual Studio 2010 Console Application project into which I have copied your code. If a compile the project in Debug mode and with the platform target as x86 the output will be the following:

> Direct call: 48.38 > Through interface: 42.43

Actually every time when running the application it will provide slightly different results, but the interface calls will always be faster. I assume that since the application is compiled as x86, it will be run by the OS through WoW.

For a complete reference, below are the results for the rest of compilation configuration and target combinations.

Release mode and x86 target
Direct call: 23.02
Through interface: 32.73

Debug mode and x64 target
Direct call: 49.49
Through interface: 56.97

Release mode and x64 target
Direct call: 19.60
Through interface: 26.45

All of the above tests were made with .NET 4.0 as the target platform for the compiler. When switching to 3.5 and repeating the above tests, the calls through the interface were always longer than the direct calls.

So, the above tests rather complicate things since it seems that the behavior you spotted is not always happening.

In the end, with the risk of upsetting you, I would like to add a few thoughts. Many people added comments that the performance differences are quite small and in real world programming you should not care about them and I agree with this point of view. There are two main reasons for it.

The first and the most advertised one is that .NET was build on a higher level in order to enable developers to focus on the higher levels of applications. A database or an external service call is thousands or sometimes millions of times slower than a virtual method call. Having a good high level architecture and focusing on the big performance consumers will always bring better results in modern applications rather than avoiding double-pointer-dereferences.

The second and more obscure one is that the .NET team by building the framework on a higher level has actually introduced a series of abstraction levels which the just in time compiler would be able to use for optimizations on different platforms. The more access they would give to the under layers, the more developers would be able to optimize for a specific platform, but the less the runtime compiler would be able to do for the others. That is the theory at least and that is why things are not as well documented as in C++ regarding this particular matter.

Solution 4 - C#

The general rule is: Classes are fast. Interfaces are slow.

That's one of the reasons for the recommendation "Build hierarchies with classes and use interfaces for intra-hierarchy behavior".

For virtual methods, the difference might be slight (like 10%). But for non-virtual methods and fields the difference is huge. Consider this program.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace InterfaceFieldConsoleApplication
{
    class Program
    {
        public abstract class A
        {
            public int Counter;
        }

        public interface IA
        {
            int Counter { get; set; }
        }

        public class B : A, IA
        {
            public new int Counter { get { return base.Counter; } set { base.Counter = value; } }
        }

        static void Main(string[] args)
        {
            var b = new B();
            A a = b;
            IA ia = b;
            const long LoopCount = (int) (100*10e6);
            var stopWatch = new Stopwatch();
            stopWatch.Start();
            for (int i = 0; i < LoopCount; i++)
                a.Counter = i;
            stopWatch.Stop();
            Console.WriteLine("a.Counter: {0}", stopWatch.ElapsedMilliseconds);
            stopWatch.Reset();
            stopWatch.Start();
            for (int i = 0; i < LoopCount; i++)
                ia.Counter = i;
            stopWatch.Stop();
            Console.WriteLine("ia.Counter: {0}", stopWatch.ElapsedMilliseconds);
            Console.ReadKey();
        }
    }
}

Output:

a.Counter: 1560
ia.Counter: 4587

Solution 5 - C#

I think the pure virtual function case can use a simple virtual function table, as any derived class of Foo implementing Bar would just change the virtual function pointer to Bar.

On the other hand, calling an interface function IFoo:Bar couldn't do a lookup at something like IFoo's virtual function table, because every implementation of IFoo doesn't need to necceserely implement other functions nor interfaces that Foo does. So the virtual function table entry position for Bar from another class Fubar: IFoo must not match the virtual function table entry position of Bar in class Foo:IFoo.

Thus a pure virtual function call can rely on the same index of the function pointer inside the virtual function table in every derived class, while the interface call has to look up the this index first.

Content Type	Original Author	Original Content on Stackoverflow
Question	Branko Dimitrijevic	View Question on Stackoverflow
Solution 1 - C#	Jim Mischel	View Answer on Stackoverflow
Solution 2 - C#	Steve Wellens	View Answer on Stackoverflow
Solution 3 - C#	Florin Dumitrescu	View Answer on Stackoverflow
Solution 4 - C#	Johan Nilsson	View Answer on Stackoverflow
Solution 5 - C#	dronus	View Answer on Stackoverflow