Performance of "direct" virtual call vs. interface call in C#
C#.NetPerformanceLanguage DesignC# Problem Overview
This benchmark appears to show that calling a virtual method directly on object reference is faster than calling it on the reference to the interface this object implements.
In other words:
interface IFoo {
void Bar();
}
class Foo : IFoo {
public virtual void Bar() {}
}
void Benchmark() {
Foo f = new Foo();
IFoo f2 = f;
f.Bar(); // This is faster.
f2.Bar();
}
Coming from the C++ world, I would have expected that both of these calls would be implemented identically (as a simple virtual table lookup) and have the same performance. How does C# implement virtual calls and what is this "extra" work that apparently gets done when calling through an interface?
--- EDIT ---
OK, answers/comments I got so far imply that there is a double-pointer-dereference for virtual call through interface versus just one dereference for virtual call through object.
So could please somebody explain why is that necessary? What is the structure of the virtual table in C#? Is it "flat" (as is typical for C++) or not? What were the design tradeoffs that were made in C# language design that lead to this? I'm not saying this is a "bad" design, I'm simply curious as to why it was necessary.
In a nutshell, I'd like to understand what my tool does under the hood so I can use it more effectively. And I would appreciate if I didn't get any more "you shouldn't know that" or "use another language" types of answers.
--- EDIT 2 ---
Just to make it clear we are not dealing with some compiler of JIT optimization here that removes the dynamic dispatch: I modified the benchmark mentioned in the original question to instantiate one class or the other randomly at run-time. Since the instantiation happens after compilation and after assembly loading/JITing, there is no way to avoid dynamic dispatch in both cases:
interface IFoo {
void Bar();
}
class Foo : IFoo {
public virtual void Bar() {
}
}
class Foo2 : Foo {
public override void Bar() {
}
}
class Program {
static Foo GetFoo() {
if ((new Random()).Next(2) % 2 == 0)
return new Foo();
return new Foo2();
}
static void Main(string[] args) {
var f = GetFoo();
IFoo f2 = f;
Console.WriteLine(f.GetType());
// JIT warm-up
f.Bar();
f2.Bar();
int N = 10000000;
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < N; i++) {
f.Bar();
}
sw.Stop();
Console.WriteLine("Direct call: {0:F2}", sw.Elapsed.TotalMilliseconds);
sw.Reset();
sw.Start();
for (int i = 0; i < N; i++) {
f2.Bar();
}
sw.Stop();
Console.WriteLine("Through interface: {0:F2}", sw.Elapsed.TotalMilliseconds);
// Results:
// Direct call: 24.19
// Through interface: 40.18
}
}
--- EDIT 3 ---
If anyone is interested, here is how my Visual C++ 2010 lays out an instance of a class that multiply-inherits other classes:
Code:
class IA {
public:
virtual void a() = 0;
};
class IB {
public:
virtual void b() = 0;
};
class C : public IA, public IB {
public:
virtual void a() override {
std::cout << "a" << std::endl;
}
virtual void b() override {
std::cout << "b" << std::endl;
}
};
Debugger:
c {...} C
IA {...} IA
__vfptr 0x00157754 const C::`vftable'{for `IA'} *
[0] 0x00151163 C::a(void) *
IB {...} IB
__vfptr 0x00157748 const C::`vftable'{for `IB'} *
[0] 0x0015121c C::b(void) *
Multiple virtual table pointers are clearly visible, and sizeof(C) == 8
(in 32-bit build).
The...
C c;
std::cout << static_cast<IA*>(&c) << std::endl;
std::cout << static_cast<IB*>(&c) << std::endl;
..prints...
0027F778
0027F77C
...indicating that pointers to different interfaces within the same object actually point to different parts of that object (i.e. they contain different physical addresses).
C# Solutions
Solution 1 - C#
I think the article Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects will answer your questions. In particular, see the section *Interface Vtable Map and Interface Map-, and the following section on Virtual Dispatch.
It's probably possible for the JIT compiler to figure things out and optimize the code for your simple case. But not in the general case.
IFoo f2 = GetAFoo();
And GetAFoo
is defined as returning an IFoo
, then the JIT compiler wouldn't be able to optimize the call.
Solution 2 - C#
Here is what the disassembly looks like (Hans is correct):
f.Bar(); // This is faster.
00000062 mov rax,qword ptr [rsp+20h]
00000067 mov rax,qword ptr [rax]
0000006a mov rcx,qword ptr [rsp+20h]
0000006f call qword ptr [rax+60h]
f2.Bar();
00000072 mov r11,7FF000400A0h
0000007c mov qword ptr [rsp+38h],r11
00000081 mov rax,qword ptr [rsp+28h]
00000086 cmp byte ptr [rax],0
00000089 mov rcx,qword ptr [rsp+28h]
0000008e mov r11,qword ptr [rsp+38h]
00000093 mov rax,qword ptr [rsp+38h]
00000098 call qword ptr [rax]
Solution 3 - C#
I tried your test and on my machine, in a particular context, the result is actually the other way around.
I am running Windows 7 x64 and I have created a Visual Studio 2010 Console Application project into which I have copied your code. If a compile the project in Debug mode and with the platform target as x86 the output will be the following:
> Direct call: 48.38 > Through interface: 42.43
Actually every time when running the application it will provide slightly different results, but the interface calls will always be faster. I assume that since the application is compiled as x86, it will be run by the OS through WoW.
For a complete reference, below are the results for the rest of compilation configuration and target combinations.
Release mode and x86 target
Direct call: 23.02
Through interface: 32.73
Debug mode and x64 target
Direct call: 49.49
Through interface: 56.97
Release mode and x64 target
Direct call: 19.60
Through interface: 26.45
All of the above tests were made with .NET 4.0 as the target platform for the compiler. When switching to 3.5 and repeating the above tests, the calls through the interface were always longer than the direct calls.
So, the above tests rather complicate things since it seems that the behavior you spotted is not always happening.
In the end, with the risk of upsetting you, I would like to add a few thoughts. Many people added comments that the performance differences are quite small and in real world programming you should not care about them and I agree with this point of view. There are two main reasons for it.
The first and the most advertised one is that .NET was build on a higher level in order to enable developers to focus on the higher levels of applications. A database or an external service call is thousands or sometimes millions of times slower than a virtual method call. Having a good high level architecture and focusing on the big performance consumers will always bring better results in modern applications rather than avoiding double-pointer-dereferences.
The second and more obscure one is that the .NET team by building the framework on a higher level has actually introduced a series of abstraction levels which the just in time compiler would be able to use for optimizations on different platforms. The more access they would give to the under layers, the more developers would be able to optimize for a specific platform, but the less the runtime compiler would be able to do for the others. That is the theory at least and that is why things are not as well documented as in C++ regarding this particular matter.
Solution 4 - C#
The general rule is: Classes are fast. Interfaces are slow.
That's one of the reasons for the recommendation "Build hierarchies with classes and use interfaces for intra-hierarchy behavior".
For virtual methods, the difference might be slight (like 10%). But for non-virtual methods and fields the difference is huge. Consider this program.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace InterfaceFieldConsoleApplication
{
class Program
{
public abstract class A
{
public int Counter;
}
public interface IA
{
int Counter { get; set; }
}
public class B : A, IA
{
public new int Counter { get { return base.Counter; } set { base.Counter = value; } }
}
static void Main(string[] args)
{
var b = new B();
A a = b;
IA ia = b;
const long LoopCount = (int) (100*10e6);
var stopWatch = new Stopwatch();
stopWatch.Start();
for (int i = 0; i < LoopCount; i++)
a.Counter = i;
stopWatch.Stop();
Console.WriteLine("a.Counter: {0}", stopWatch.ElapsedMilliseconds);
stopWatch.Reset();
stopWatch.Start();
for (int i = 0; i < LoopCount; i++)
ia.Counter = i;
stopWatch.Stop();
Console.WriteLine("ia.Counter: {0}", stopWatch.ElapsedMilliseconds);
Console.ReadKey();
}
}
}
Output:
a.Counter: 1560
ia.Counter: 4587
Solution 5 - C#
I think the pure virtual function case can use a simple virtual function table, as any derived class of Foo
implementing Bar
would just change the virtual function pointer to Bar
.
On the other hand, calling an interface function IFoo:Bar couldn't do a lookup at something like IFoo
's virtual function table, because every implementation of IFoo
doesn't need to necceserely implement other functions nor interfaces that Foo
does. So the virtual function table entry position for Bar
from another class Fubar: IFoo
must not match the virtual function table entry position of Bar
in class Foo:IFoo
.
Thus a pure virtual function call can rely on the same index of the function pointer inside the virtual function table in every derived class, while the interface call has to look up the this index first.