The cost of passing by shared_ptr

C++PerformanceShared Ptr

C++ Problem Overview


I use std::tr1::shared_ptr extensively throughout my application. This includes passing objects in as function arguments. Consider the following:

class Dataset {...}

void f( shared_ptr< Dataset const > pds ) {...}
void g( shared_ptr< Dataset const > pds ) {...}
...

While passing a dataset object around via shared_ptr guarantees its existence inside f and g, the functions may be called millions of times, which causes a lot of shared_ptr objects being created and destroyed. Here's a snippet of the flat gprof profile from a recent run:

Each sample counts as 0.01 seconds.
%   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call  name
9.74    295.39    35.12 2451177304     0.00     0.00  std::tr1::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::tr1::__shared_count<(__gnu_cxx::_Lock_policy)2> const&)
8.03    324.34    28.95 2451252116     0.00     0.00  std::tr1::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()

So, ~17% of the runtime was spent on reference counting with shared_ptr objects. Is this normal?

A large portion of my application is single-threaded and I was thinking about re-writing some of the functions as

void f( const Dataset& ds ) {...}

and replacing the calls

shared_ptr< Dataset > pds( new Dataset(...) );
f( pds );

with

f( *pds );

in places where I know for sure the object will not get destroyed while the flow of the program is inside f(). But before I run off to change a bunch of function signatures / calls, I wanted to know what the typical performance hit of passing by shared_ptr was. Seems like shared_ptr should not be used for functions that get called very often.

Any input would be appreciated. Thanks for reading.

-Artem

Update: After changing a handful of functions to accept const Dataset&, the new profile looks like this:

Each sample counts as 0.01 seconds.
%   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call  name
0.15    241.62     0.37 24981902     0.00     0.00  std::tr1::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
0.12    241.91     0.30 28342376     0.00     0.00  std::tr1::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::tr1::__shared_count<(__gnu_cxx::_Lock_p
olicy)2> const&)

I'm a little puzzled by the number of destructor calls being smaller than the number of copy constructor calls, but overall I'm very pleased with the decrease in the associated run-time. Thanks to all for their advice.

C++ Solutions


Solution 1 - C++

Always pass your shared_ptr by const reference:

void f(const shared_ptr<Dataset const>& pds) {...} 
void g(const shared_ptr<Dataset const>& pds) {...} 

Edit: Regarding the safety issues mentioned by others:

  • When using shared_ptr heavily throughout an application, passing by value will take up a tremendous amount of time (I've seen it go 50+%).
  • Use const T& instead of const shared_ptr<T const>& when the argument shall not be null.
  • Using const shared_ptr<T const>& is safer than const T* when performance is an issue.

Solution 2 - C++

You need shared_ptr only to pass it to functions/objects which keep it for future use. For example, some class may keep shared_ptr for using in an worker thread. For simple synchronous calls it's quite enough to use plain pointer or reference. shared_ptr should not replace using plain pointers completely.

Solution 3 - C++

If you're not using make_shared, could you give that a go? By locating the reference count and the object in the same area of memory you may see a performance gain associated with cache coherency. Worth a try anyway.

Solution 4 - C++

Any object creation and destruction, especially redundant object creation and destruction, should be avoided in performance-critical applications.

Consider what shared_ptr is doing. Not only is it creating a new object and filling it in, but it's also referencing the shared state to increment reference information, and the object itself presumably lives somewhere else completely which is going to be nightmarish on your cache.

Presumably you need the shared_ptr (because if you could get away with a local object you wouldn't allocate one off of the heap), but you could even "cache" the result of the shared_ptr dereference:

void fn(shared_ptr< Dataset > pds)
{
   Dataset& ds = *pds;

   for (i = 0; i < 1000; ++i)
   {
      f(ds);
      g(ds);
   }
}

...because even *pds requires hitting more memory than is absolutely necessary.

Solution 5 - C++

It sounds like you really know what you're doing. You've profiled your application, and you know exactly where cycles are being used. You understand that calling the constructor to a reference counting pointer is expensive only if you do it constantly.

The only heads up I can give you is: suppose inside function f(t *ptr), if you call another function that uses shared pointers, and you do other(ptr) and other makes a shared pointer of the raw pointer. When that second shared pointers' reference count hits 0 then you have effectively deleted your object....even though you didn't want to. you said you used reference counting pointers a lot, so you have to watch out for corner cases like that.

EDIT: You can make the destructor private, and only a friend of the shared pointer class, so that way the destructor can only be called by a shared pointer, then you're safe. Doesn't prevent multiple deletions from shared pointers. As per comment from Mat.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionArtem SokolovView Question on Stackoverflow
Solution 1 - C++Sam HarwellView Answer on Stackoverflow
Solution 2 - C++Alex FView Answer on Stackoverflow
Solution 3 - C++KylotanView Answer on Stackoverflow
Solution 4 - C++dash-tom-bangView Answer on Stackoverflow
Solution 5 - C++Chris HView Answer on Stackoverflow