What is the performance overhead of std::function?

C++BoostStd

C++ Problem Overview


I heard on a forum using std::function<> causes performance drop. Is it true? If true, is it a big performance drop?

C++ Solutions


Solution 1 - C++

There are, indeed, performance issues with std:function that must be taken into account whenever using it. The main strength of std::function, namely, its type-erasure mechanism, does not come for free, and we might (but not necessarily must) pay a price for that.

std::function is a template class that wraps callable types. However, it is not parametrized on the callable type itself but only on its return and argument types. The callable type is known only at construction time and, therefore, std::function cannot have a pre-declared member of this type to hold a copy of the object given to its constructor.

Roughly speaking (actually, things are more complicated than that) std::function can hold only a pointer to the object passed to its constructor, and this raises a lifetime issue. If the pointer points to an object whose lifetime is smaller than that of the std::function object, then the inner pointer will become dangling. To prevent this problem std::function might make a copy of the object on the heap through a call to operator new (or a custom allocator). The dynamic memory allocation is what people refer the most as a performance penalty implied by std::function.

I have recently written an article with more details and that explains how (and where) one can avoid paying the price of a memory allocation.

Efficient Use of Lambda Expressions and std::function

Solution 2 - C++

You can find information from the boost's reference materials: How much overhead does a call through boost::function incur? and Performance

This doesn't determine "yes or no" to boost function. The performance drop may be well acceptable given program's requirements. More often than not, parts of a program are not performance-critical. And even then it may be acceptable. This is only something you can determine.

As to the standard library version, the standard only defines an interface. It is entirely up to individual implementations to make it work. I suppose a similar implementation to boost's function would be used.

Solution 3 - C++

Firstly, the overhead gets smaller with the inside of the function; the higher the workload, the smaller the overhead.

Secondly: g++ 4.5 does not show any difference compared to virtual functions:

main.cc

#include <functional>
#include <iostream>

// Interface for virtual function test.
struct Virtual {
    virtual ~Virtual() {}
    virtual int operator() () const = 0;
};

// Factory functions to steal g++ the insight and prevent some optimizations.
Virtual *create_virt();
std::function<int ()> create_fun();
std::function<int ()> create_fun_with_state();

// The test. Generates actual output to prevent some optimizations.
template <typename T>
int test (T const& fun) {
    int ret = 0;
    for (int i=0; i<1024*1024*1024; ++i) {
        ret += fun();
    }    
    return ret;
}

// Executing the tests and outputting their values to prevent some optimizations.
int main () {
    {
        const clock_t start = clock();
        std::cout << test(*create_virt()) << '\n';
        const double secs = (clock()-start) / double(CLOCKS_PER_SEC);
        std::cout << "virtual: " << secs << " secs.\n";
    }
    {
        const clock_t start = clock();
        std::cout << test(create_fun()) << '\n';
        const double secs = (clock()-start) / double(CLOCKS_PER_SEC);
        std::cout << "std::function: " << secs << " secs.\n";
    }
    {
        const clock_t start = clock();
        std::cout << test(create_fun_with_state()) << '\n';
        const double secs = (clock()-start) / double(CLOCKS_PER_SEC);
        std::cout << "std::function with bindings: " << secs << " secs.\n";
    }
}

impl.cc

#include <functional>

struct Virtual {
    virtual ~Virtual() {}
    virtual int  operator() () const = 0;
};
struct Impl : Virtual {
    virtual ~Impl() {}
    virtual int  operator() () const { return 1; }
};

Virtual *create_virt() { return new Impl; }

std::function<int ()> create_fun() { 
    return  []() { return 1; };
}

std::function<int ()> create_fun_with_state() { 
    int x,y,z;
    return  [=]() { return 1; };
}

Output of g++ --std=c++0x -O3 impl.cc main.cc && ./a.out:

1073741824
virtual: 2.9 secs.
1073741824
std::function: 2.9 secs.
1073741824
std::function with bindings: 2.9 secs.

So, fear not. If your design/maintainability can improve from prefering std::function over virtual calls, try them. Personally, I really like the idea of not forcing interfaces and inheritance on clients of my classes.

Solution 4 - C++

This depends strongly if you are passing the function without binding any argument (does not allocate heap space) or not.

Also depends on other factors, but this is the main one.

It is true that you need something to compare against, you can't just simply say that it 'reduces overhead' compared to not using it at all, you need to compare it to using an alternative way to passing a function. And if you can just dispense of using it at all then it was not needed from the beginning

Solution 5 - C++

std::function<> / std::function<> with bind( ... ) is extremely fast. Check this:

#include <iostream>
#include <functional>
#include <chrono>

using namespace std;
using namespace chrono;

int main()
{
	static size_t const ROUNDS = 1'000'000'000;
	static
	auto bench = []<typename Fn>( Fn const &fn ) -> double
	{
		auto start = high_resolution_clock::now();
		fn();
		return (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / (double)ROUNDS;
	};
	int i;
	static
	auto CLambda = []( int &i, int j )
	{
		i += j;
	};
	auto bCFn = [&]() -> double
	{
		void (*volatile pFnLambda)( int &i, int j ) = CLambda;
		return bench( [&]()
			{	
				for( size_t j = ROUNDS; j--; j )
					pFnLambda( i, 2 );
			} );
	};
	auto bndObj = bind( CLambda, ref( i ), 2 );
	auto bBndObj = [&]() -> double
	{
		decltype(bndObj) *volatile pBndObj = &bndObj;
		return bench( [&]()
			{
				for( size_t j = ROUNDS; j--; j )
					(*pBndObj)();
			} );
	};
	using fn_t = function<void()>;
	auto bFnBndObj = [&]() -> double
	{
		fn_t fnBndObj = fn_t( bndObj );
		fn_t *volatile pFnBndObj = &fnBndObj;
		return bench( [&]()
			{
				for( size_t j = ROUNDS; j--; j )
					(*pFnBndObj)();
			} );
	};
	auto bFnBndObjCap = [&]() -> double
	{
		auto capLambda = [&i]( int j )
		{
			i += j;
		};
		fn_t fnBndObjCap = fn_t( bind( capLambda, 2 ) );
		fn_t *volatile pFnBndObjCap = &fnBndObjCap;
		return bench( [&]()
			{
				for( size_t j = ROUNDS; j--; j )
					(*pFnBndObjCap)();
			} );
	};
	using bench_fn = function<double()>;
	static const
	struct descr_bench
	{
		char const *descr;
		bench_fn const fn;
	} dbs[] =
	{
		{ "C-function",
		  bench_fn( bind( bCFn ) ) },
		{ "C-function in bind( ... ) with all parameters",
		  bench_fn( bind( bBndObj ) ) },
		{ "C-function in function<>( bind( ... ) ) with all parameters",
		  bench_fn( bind( bFnBndObj ) ) },
		{ "lambda capturiging first parameter in function<>( bind( lambda, 2 ) )",
		  bench_fn( bind( bFnBndObjCap ) ) }
	};
	for( descr_bench const &db : dbs )
		cout << db.descr << ":" << endl,
		cout << db.fn() << endl;
}

All calls are below 2ns on my computer.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser408141View Question on Stackoverflow
Solution 1 - C++Cassio NeriView Answer on Stackoverflow
Solution 2 - C++UncleBensView Answer on Stackoverflow
Solution 3 - C++Sebastian MachView Answer on Stackoverflow
Solution 4 - C++lurscherView Answer on Stackoverflow
Solution 5 - C++Bonita MonteroView Answer on Stackoverflow