How expensive is RTTI?

C++PerformanceRtti

C++ Problem Overview


I understand that there is a resource hit from using RTTI, but how big is it? Everywhere I've looked just says that "RTTI is expensive," but none of them actually give any benchmarks or quantitative data reguarding memory, processor time, or speed.

So, just how expensive is RTTI? I might use it on an embedded system where I have only 4MB of RAM, so every bit counts.

Edit: As per S. Lott's answer, it would be better if I include what I'm actually doing. I am using a class to pass in data of different lengths and that can perform different actions, so it would be difficult to do this using only virtual functions. It seems that using a few dynamic_casts could remedy this problem by allowing the different derived classes to be passed through the different levels yet still allow them to act completely differently.

From my understanding, dynamic_cast uses RTTI, so I was wondering how feasable it would be to use on a limited system.

C++ Solutions


Solution 1 - C++

Regardless of compiler, you can always save on runtime if you can afford to do

if (typeid(a) == typeid(b)) {
  B* ba = static_cast<B*>(&a);
  etc;
}

instead of

B* ba = dynamic_cast<B*>(&a);
if (ba) {
  etc;
}

The former involves only one comparison of std::type_info; the latter necessarily involves traversing an inheritance tree plus comparisons.

Past that ... like everyone says, the resource usage is implementation specific.

I agree with everyone else's comments that the submitter should avoid RTTI for design reasons. However, there are good reasons to use RTTI (mainly because of boost::any). That in mind, it's useful to know its actual resource usage in common implementations.

I recently did a bunch of research into RTTI in GCC.

tl;dr: RTTI in GCC uses negligible space and typeid(a) == typeid(b) is very fast, on many platforms (Linux, BSD and maybe embedded platforms, but not mingw32). If you know you'll always be on a blessed platform, RTTI is very close to free.

Gritty details:

GCC prefers to use a particular "vendor-neutral" C++ ABI[1], and always uses this ABI for Linux and BSD targets[2]. For platforms that support this ABI and also weak linkage, typeid() returns a consistent and unique object for each type, even across dynamic linking boundaries. You can test &typeid(a) == &typeid(b), or just rely on the fact that the portable test typeid(a) == typeid(b) does actually just compare a pointer internally.

In GCC's preferred ABI, a class vtable always holds a pointer to a per-type RTTI structure, though it might not be used. So a typeid() call itself should only cost as much as any other vtable lookup (the same as calling a virtual member function), and RTTI support shouldn't use any extra space for each object.

From what I can make out, the RTTI structures used by GCC (these are all the subclasses of std::type_info) only hold a few bytes for each type, aside from the name. It isn't clear to me whether the names are present in the output code even with -fno-rtti. Either way, the change in size of the compiled binary should reflect the change in runtime memory usage.

A quick experiment (using GCC 4.4.3 on Ubuntu 10.04 64-bit) shows that -fno-rtti actually increases the binary size of a simple test program by a few hundred bytes. This happens consistently across combinations of -g and -O3. I'm not sure why the size would increase; one possibility is that GCC's STL code behaves differently without RTTI (since exceptions won't work).

[1] Known as the Itanium C++ ABI, documented at http://www.codesourcery.com/public/cxx-abi/abi.html. The names are horribly confusing: the name refers to the original development architecture, though the ABI specification works on lots of architectures including i686/x86_64. Comments in GCC's internal source and STL code refer to Itanium as the "new" ABI in contrast to the "old" one they used before. Worse, the "new"/Itanium ABI refers to all versions available through -fabi-version; the "old" ABI predated this versioning. GCC adopted the Itanium/versioned/"new" ABI in version 3.0; the "old" ABI was used in 2.95 and earlier, if I am reading their changelogs correctly.

[2] I couldn't find any resource listing std::type_info object stability by platform. For compilers I had access to, I used the following: echo "#include <typeinfo>" | gcc -E -dM -x c++ -c - | grep GXX_MERGED_TYPEINFO_NAMES. This macro controls the behavior of operator== for std::type_info in GCC's STL, as of GCC 3.0. I did find that mingw32-gcc obeys the Windows C++ ABI, where std::type_info objects aren't unique for a type across DLLs; typeid(a) == typeid(b) calls strcmp under the covers. I speculate that on single-program embedded targets like AVR, where there is no code to link against, std::type_info objects are always stable.

Solution 2 - C++

Perhaps these figures would help.

I was doing a quick test using this:

  • GCC Clock() + XCode's Profiler.
  • 100,000,000 loop iterations.
  • 2 x 2.66 GHz Dual-Core Intel Xeon.
  • The class in question is derived from a single base class.
  • typeid().name() returns "N12fastdelegate13FastDelegate1IivEE"

5 Cases were tested:

1) dynamic_cast< FireType* >( mDelegate )
2) typeid( *iDelegate ) == typeid( *mDelegate )
3) typeid( *iDelegate ).name() == typeid( *mDelegate ).name()
4) &typeid( *iDelegate ) == &typeid( *mDelegate )
5) { 
       fastdelegate::FastDelegateBase *iDelegate;
       iDelegate = new fastdelegate::FastDelegate1< t1 >;
       typeid( *iDelegate ) == typeid( *mDelegate )
   }

5 is just my actual code, as I needed to create an object of that type before checking if it is similar to one I already have.

Without Optimisation

For which the results were (I've averaged a few runs):

1)  1,840,000 Ticks (~2  Seconds) - dynamic_cast
2)    870,000 Ticks (~1  Second)  - typeid()
3)    890,000 Ticks (~1  Second)  - typeid().name()
4)    615,000 Ticks (~1  Second)  - &typeid()
5) 14,261,000 Ticks (~23 Seconds) - typeid() with extra variable allocations.

So the conclusion would be:

  • For simple cast cases without optimisation typeid() is more than twice faster than dyncamic_cast.
  • On a modern machine the difference between the two is about 1 nanosecond (a millionth of a millisecond).

With Optimisation (-Os)

1)  1,356,000 Ticks - dynamic_cast
2)     76,000 Ticks - typeid()
3)     76,000 Ticks - typeid().name()
4)     75,000 Ticks - &typeid()
5)     75,000 Ticks - typeid() with extra variable allocations.

So the conclusion would be:

  • For simple cast cases with optimisation, typeid() is nearly x20 faster than dyncamic_cast.

Chart

enter image description here

The Code

As requested in the comments, the code is below (a bit messy, but works). 'FastDelegate.h' is available from here.

#include <iostream>
#include "FastDelegate.h"
#include "cycle.h"
#include "time.h"

// Undefine for typeid checks
#define CAST

class ZoomManager
{
public:
    template < class Observer, class t1 >
    void Subscribe( void *aObj, void (Observer::*func )( t1 a1 ) )
    {
        mDelegate = new fastdelegate::FastDelegate1< t1 >;
        
        std::cout << "Subscribe\n";
        Fire( true );
    }
    
    template< class t1 >
    void Fire( t1 a1 )
    {
        fastdelegate::FastDelegateBase *iDelegate;
        iDelegate = new fastdelegate::FastDelegate1< t1 >;
        
        int t = 0;
        ticks start = getticks();
        
        clock_t iStart, iEnd;
        
        iStart = clock();
        
        typedef fastdelegate::FastDelegate1< t1 > FireType;
        
        for ( int i = 0; i < 100000000; i++ ) {
        
#ifdef CAST
                if ( dynamic_cast< FireType* >( mDelegate ) )
#else
                // Change this line for comparisons .name() and & comparisons
                if ( typeid( *iDelegate ) == typeid( *mDelegate ) )
#endif
                {
                    t++;
                } else {
                    t--;
                }
        }
        
        iEnd = clock();
        printf("Clock ticks: %i,\n", iEnd - iStart );
        
        std::cout << typeid( *mDelegate ).name()<<"\n";
        
        ticks end = getticks();
        double e = elapsed(start, end);
        std::cout << "Elasped: " << e;
    }
    
    template< class t1, class t2 >
    void Fire( t1 a1, t2 a2 )
    {
        std::cout << "Fire\n";
    }
    
    fastdelegate::FastDelegateBase *mDelegate;
};

class Scaler
{
public:
    Scaler( ZoomManager *aZoomManager ) :
        mZoomManager( aZoomManager ) { }
    
    void Sub()
    {
        mZoomManager->Subscribe( this, &Scaler::OnSizeChanged );
    }
    
    void OnSizeChanged( int X  )
    {
        std::cout << "Yey!\n";        
    }
private:
    ZoomManager *mZoomManager;
};

int main(int argc, const char * argv[])
{
    ZoomManager *iZoomManager = new ZoomManager();
    
    Scaler iScaler( iZoomManager );
    iScaler.Sub();
        
    delete iZoomManager;

    return 0;
}

Solution 3 - C++

It depends on the scale of things. For the most part it's just a couple checks and a few pointer dereferences. In most implementations, at the top of every object that has virtual functions, there is a pointer to a vtable that holds a list of pointers to all the implementations of the virtual function on that class. I would guess that most implementations would use this to either store another pointer to the type_info structure for the class.

For example in pseudo-c++:

struct Base
{
    virtual ~Base() {}
};

struct Derived
{
    virtual ~Derived() {}
};


int main()
{
    Base *d = new Derived();
    const char *name = typeid(*d).name(); // C++ way

    // faked up way (this won't actually work, but gives an idea of what might be happening in some implementations).
    const vtable *vt = reinterpret_cast<vtable *>(d);
    type_info *ti = vt->typeinfo;
    const char *name = ProcessRawName(ti->name);       
}

In general the real argument against RTTI is the unmaintainability of having to modify code everywhere every time you add a new derived class. Instead of switch statements everywhere, factor those into virtual functions. This moves all the code that is different between classes into the classes themselves, so that a new derivation just needs to override all the virtual functions to become a fully functioning class. If you've ever had to hunt through a large code base for every time someone checks the type of a class and does something different, you'll quickly learn to stay away from that style of programming.

If your compiler lets you totally turn off RTTI, the final resulting code size savings can be significant though, with such a small RAM space. The compiler needs to generate a type_info structure for every single class with a virtual function. If you turn off RTTI, all these structures do not need to be included in the executable image.

Solution 4 - C++

Well, the profiler never lies.

Since I have a pretty stable hierarchy of 18-20 types that is not changing very much, I wondered if just using a simple enum'd member would do the trick and avoid the purportedly "high" cost of RTTI. I was skeptical if RTTI was in fact more expensive than just the if statement it introduces. Boy oh boy, is it.

It turns out that RTTI is expensive, much more expensive than an equivalent if statement or a simple switch on a primitive variable in C++. So S.Lott's answer is not completely correct, there is extra cost for RTTI, and it's not due to just having an if statement in the mix. It's due to that RTTI is very expensive.

This test was done on the Apple LLVM 5.0 compiler, with stock optimizations turned on (default release mode settings).

So, I have below 2 functions, each of which figures out the concrete type of an object either via 1) RTTI or 2) a simple switch. It does so 50,000,000 times. Without further ado, I present to you the relative runtimes for 50,000,000 runs.

enter image description here

That's right, the dynamicCasts took 94% of runtime. While the regularSwitch block only took 3.3%.

Long story short: If you can afford the energy to hook-in an enum'd type as I did below, I'd probably recommend it, if you need to do RTTI and performance is paramount. It only takes setting the member once (make sure to get it via all constructors), and be sure to never write it afterward.

That said, doing this should not mess up your OOP practices.. it's only meant to be used when type information simply isn't available and you find yourself cornered into using RTTI.

#include <stdio.h>
#include <vector>
using namespace std;

enum AnimalClassTypeTag
{
  TypeAnimal=1,
  TypeCat=1<<2,TypeBigCat=1<<3,TypeDog=1<<4
} ;

struct Animal
{
  int typeTag ;// really AnimalClassTypeTag, but it will complain at the |= if
               // at the |='s if not int
  Animal() {
    typeTag=TypeAnimal; // start just base Animal.
    // subclass ctors will |= in other types
  }
  virtual ~Animal(){}//make it polymorphic too
} ;

struct Cat : public Animal
{
  Cat(){
    typeTag|=TypeCat; //bitwise OR in the type
  }
} ;

struct BigCat : public Cat
{
  BigCat(){
    typeTag|=TypeBigCat;
  }
} ;

struct Dog : public Animal
{
  Dog(){
    typeTag|=TypeDog;
  }
} ;

typedef unsigned long long ULONGLONG;

void dynamicCasts(vector<Animal*> &zoo, ULONGLONG tests)
{
  ULONGLONG animals=0,cats=0,bigcats=0,dogs=0;
  for( ULONGLONG i = 0 ; i < tests ; i++ )
  {
    for( Animal* an : zoo )
    {
      if( dynamic_cast<Dog*>( an ) )
        dogs++;
      else if( dynamic_cast<BigCat*>( an ) )
        bigcats++;
      else if( dynamic_cast<Cat*>( an ) )
        cats++;
      else //if( dynamic_cast<Animal*>( an ) )
        animals++;
    }
  }
  
  printf( "%lld animals, %lld cats, %lld bigcats, %lld dogs\n", animals,cats,bigcats,dogs ) ;
  
}

//*NOTE: I changed from switch to if/else if chain
void regularSwitch(vector<Animal*> &zoo, ULONGLONG tests)
{
  ULONGLONG animals=0,cats=0,bigcats=0,dogs=0;
  for( ULONGLONG i = 0 ; i < tests ; i++ )
  {
    for( Animal* an : zoo )
    {
      if( an->typeTag & TypeDog )
        dogs++;
      else if( an->typeTag & TypeBigCat )
        bigcats++;
      else if( an->typeTag & TypeCat )
        cats++;
      else
        animals++;
    }
  }
  printf( "%lld animals, %lld cats, %lld bigcats, %lld dogs\n", animals,cats,bigcats,dogs ) ;  
  
}

int main(int argc, const char * argv[])
{
  vector<Animal*> zoo ;
  
  zoo.push_back( new Animal ) ;
  zoo.push_back( new Cat ) ;
  zoo.push_back( new BigCat ) ;
  zoo.push_back( new Dog ) ;
  
  ULONGLONG tests=50000000;
  
  dynamicCasts( zoo, tests ) ;
  regularSwitch( zoo, tests ) ;
}

Solution 5 - C++

The standard way:

cout << (typeid(Base) == typeid(Derived)) << endl;

Standard RTTI is expensive because it relies on doing a underlying string compare and thus the speed of RTTI can vary depending on the class name length.

The reason why string compares are used is to make it work consistently across library/DLL boundaries. If you build your application statically and/or you are using certain compilers then you can probably use:

cout << (typeid(Base).name() == typeid(Derived).name()) << endl;

Which is not guaranteed to work (will never give a false positive, but may give false negatives) but can be up to 15 times faster. This relies on the implementation of typeid() to work in a certain way and all you are doing is comparing an internal char pointer. This is also sometimes equivalent to:

cout << (&typeid(Base) == &typeid(Derived)) << endl;

You can however use a hybrid safely which will be very fast if the types match, and will be worst case for unmatched types:

cout << ( typeid(Base).name() == typeid(Derived).name() || 
          typeid(Base) == typeid(Derived) ) << endl;

To understand whether you need to optimize this you need to see how much of your time you spend getting a new packet, compared to the time it takes to process the packet. In most cases a string compare will probably not be a large overhead. (depending on your class or namespace::class name length)

The safest way to optimize this is to implement your own typeid as an int (or a enum Type : int ) as part of your Base class and use that to determine the type of the class, and then just use static_cast<> or reinterpret_cast<>

For me the difference is roughly 15 times on unoptimized MS VS 2005 C++ SP1.

Solution 6 - C++

For a simple check, RTTI can be as cheap as a pointer comparison. For inheritance checking, it can be as expensive as a strcmp for every type in an inheritance tree if you are dynamic_cast-ing from the top to the bottom in one implementation out there.

You can also reduce the overhead by not using dynamic_cast and instead checking the type explicitly via &typeid(...)==&typeid(type). While that doesn't necessarily work for .dlls or other dynamically loaded code, it can be quite fast for things that are statically linked.

Although at that point it's like using a switch statement, so there you go.

Solution 7 - C++

It's always best to measure things. In the following code, under g++, the use of hand coded type identification seems to be about three times faster than RTTI. I'm sure that a more realistic hand coded implementtaion using strings instead of chars would be slower, bringing the timings close together..

#include <iostream>
using namespace std;

struct Base {
    virtual ~Base() {}
    virtual char Type() const = 0;
};

struct A : public Base {
    char Type() const {
        return 'A';
    }
};

struct B : public Base {;
    char Type() const {
        return 'B';
    }
};

int main() {
    Base * bp = new A;
    int n = 0;
    for ( int i = 0; i < 10000000; i++ ) {
#ifdef RTTI
        if ( A * a = dynamic_cast <A*> ( bp ) ) {
            n++;
        }
#else
        if ( bp->Type() == 'A' ) {
            A * a = static_cast <A*>(bp);
            n++;
        }
#endif
    }
    cout << n << endl;
}

Solution 8 - C++

A while ago I measured the time costs for RTTI in the specific cases of MSVC and GCC for a 3ghz PowerPC. In the tests I ran (a fairly large C++ app with a deep class tree), each dynamic_cast<> cost between 0.8μs and 2μs, depending on whether it hit or missed.

Solution 9 - C++

> So, just how expensive is RTTI?

That depends entirely on the compiler you're using. I understand that some use string comparisons, and others use real algorithms.

Your only hope is to write a sample program and see what your compiler does (or at least determine how much time it takes to execute a million dynamic_casts or a million typeids).

Solution 10 - C++

RTTI can be cheap and doesn't necessarly need a strcmp. The compiler limits the test to perform the actual hierarchy, in reverse order. So if you have a class C that is a child of class B which is a child of class A, dynamic_cast from a A* ptr to a C* ptr imply only one pointer comparison and not two (BTW, only the vptr table pointer is compared). The test is like "if (vptr_of_obj == vptr_of_C) return (C*)obj"

Another example, if we try to dynamic_cast from A* to B*. In that case, the compiler will check both case (obj being a C, and obj being a B) in turns. This can also be simplified to a single test (most of times), as virtual function table is a made as an aggregation, so the test resume to "if (offset_of(vptr_of_obj, B) == vptr_of_B)" with

offset_of = return sizeof(vptr_table) >= sizeof(vptr_of_B) ? vptr_of_new_methods_in_B : 0

The memory layout of

vptr_of_C = [ vptr_of_A | vptr_of_new_methods_in_B | vptr_of_new_methods_in_C ]

How does the compiler know of to optimize this at compile time ?

At compile time, the compiler knows the current hierarchy of objects, so it refuse to compile different type hierarchy dynamic_casting. Then it just has to handle the hierarchy depth, and add the invert amount of tests to match such depth.

For example, this doesn't compile:

void * something = [...]; 
// Compile time error: Can't convert from something to MyClass, no hierarchy relation
MyClass * c = dynamic_cast<MyClass*>(something);  

Solution 11 - C++

RTTI can be "expensive" because you've added an if-statement every single time you do the RTTI comparison. In deeply nested iterations, this can be pricey. In something that never gets executed in a loop it's essentially free.

The choice is to use proper polymorphic design, eliminating the if-statement. In deeply nested loops, this is essential for performance. Otherwise, it doesn't matter very much.

RTTI is also expensive because it can obscure the subclass hierarchy (if there even is one). It can have the side-effect of removing the "object oriented" from "object oriented programming".

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionCristi&#225;n RomoView Question on Stackoverflow
Solution 1 - C++sbrudenellView Answer on Stackoverflow
Solution 2 - C++IzhakiView Answer on Stackoverflow
Solution 3 - C++EclipseView Answer on Stackoverflow
Solution 4 - C++boboboboView Answer on Stackoverflow
Solution 5 - C++MariusView Answer on Stackoverflow
Solution 6 - C++MSNView Answer on Stackoverflow
Solution 7 - C++anonView Answer on Stackoverflow
Solution 8 - C++CrashworksView Answer on Stackoverflow
Solution 9 - C++Max LybbertView Answer on Stackoverflow
Solution 10 - C++X-Ryl669View Answer on Stackoverflow
Solution 11 - C++S.LottView Answer on Stackoverflow