How much is the overhead of smart pointers compared to normal pointers in C++?

C++PerformanceC++11Smart Pointers

C++ Problem Overview


How much is the overhead of smart pointers compared to normal pointers in C++11? In other words, is my code going to be slower if I use smart pointers, and if so, how much slower?

Specifically, I'm asking about the C++11 std::shared_ptr and std::unique_ptr.

Obviously, the stuff pushed down the stack is going to be larger (at least I think so), because a smart pointer also needs to store its internal state (reference count, etc), the question really is, how much is this going to affect my performance, if at all?

For example, I return a smart pointer from a function instead of a normal pointer:

std::shared_ptr<const Value> getValue();
// versus
const Value *getValue();

Or, for example, when one of my functions accept a smart pointer as parameter instead of a normal pointer:

void setValue(std::shared_ptr<const Value> val);
// versus
void setValue(const Value *val);

C++ Solutions


Solution 1 - C++

std::unique_ptr has memory overhead only if you provide it with some non-trivial deleter.

std::shared_ptr always has memory overhead for reference counter, though it is very small.

std::unique_ptr has time overhead only during constructor (if it has to copy the provided deleter and/or null-initialize the pointer) and during destructor (to destroy the owned object).

std::shared_ptr has time overhead in constructor (to create the reference counter), in destructor (to decrement the reference counter and possibly destroy the object) and in assignment operator (to increment the reference counter). Due to thread-safety guarantees of std::shared_ptr, these increments/decrements are atomic, thus adding some more overhead.

Note that none of them has time overhead in dereferencing (in getting the reference to owned object), while this operation seems to be the most common for pointers.

To sum up, there is some overhead, but it shouldn't make the code slow unless you continuously create and destroy smart pointers.

Solution 2 - C++

My answer is different from the others and i really wonder if they ever profiled code.

shared_ptr has a significant overhead for creation because of it's memory allocation for the control block (which keeps the ref counter and a pointer list to all weak references). It has also a huge memory overhead because of this and the fact that std::shared_ptr is always a 2 pointer tuple (one to the object, one to the control block).

If you pass a shared_pointer to a function as a value parameter then it will be at least 10 times slower then a normal call and create lots of codes in the code segment for the stack unwinding. If you pass it by reference you get an additional indirection which can be also pretty worse in terms of performance.

Thats why you should not do this unless the function is really involved in ownership management. Otherwise use "shared_ptr.get()". It is not designed to make sure your object isn't killed during a normal function call.

If you go mad and use shared_ptr on small objects like an abstract syntax tree in a compiler or on small nodes in any other graph structure you will see a huge perfomance drop and a huge memory increase. I have seen a parser system which was rewritten soon after C++14 hit the market and before the programmer learned to use smart pointers correctly. The rewrite was a magnitude slower then the old code.

It is not a silver bullet and raw pointers aren't bad by definition either. Bad programmers are bad and bad design is bad. Design with care, design with clear ownership in mind and try to use the shared_ptr mostly on the subsystem API boundary.

If you want to learn more you can watch Nicolai M. Josuttis good talk about "The Real Price of Shared Pointers in C++" https://vimeo.com/131189627
It goes deep into the implementation details and CPU architecture for write barriers, atomic locks etc. once listening you will never talk about this feature being cheap. If you just want a proof of the magnitude slower, skip the first 48 minutes and watch him running example code which runs upto 180 times slower (compiled with -O3) when using shared pointer everywhere.

EDITED:

And if you ask about "std::unique_ptr" than visit this talk "CppCon 2019: Chandler Carruth “There Are No Zero-cost Abstractions” https://www.youtube.com/watch?v=rHIkrotSwcc

Its just not true, that unique_ptr is 100% cost free.

OFFTOPIC:

I tried to educate people about the the false idea that using exceptions that are not thrown has no cost penalty for over two decades now. In this case it's in the optimizer and the code size.

Solution 3 - C++

As with all code performance, the only really reliable means to obtain hard information is to measure and/or inspect machine code.

That said, simple reasoning says that

  • You can expect some overhead in debug builds, since e.g. operator-> must be executed as a function call so that you can step into it (this is in turn due to general lack of support for marking classes and functions as non-debug).

  • For shared_ptr you can expect some overhead in initial creation, since that involves dynamic allocation of a control block, and dynamic allocation is very much slower than any other basic operation in C++ (do use make_shared when practically possible, to minimize that overhead).

  • Also for shared_ptr there is some minimal overhead in maintaining a reference count, e.g. when passing a shared_ptr by value, but there's no such overhead for unique_ptr.

Keeping the first point above in mind, when you measure, do that both for debug and release builds.

The international C++ standardization committee has published a technical report on performance, but this was in 2006, before unique_ptr and shared_ptr were added to the standard library. Still, smart pointers were old hat at that point, so the report considered also that. Quoting the relevant part:

> “if accessing a value through a trivial smart pointer is significantly slower than accessing it through an ordinary pointer, the compiler is inefficiently handling the abstraction. In the past, most compilers had significant abstraction penalties and several current compilers still do. However, at least two compilers have been reported to have abstraction penalties below 1% and another a penalty of 3%, so eliminating this kind of overhead is well within the state of the art”

As an informed guess, the “well within the state of the art” has been achieved with the most popular compilers today, as of early 2014.

Solution 4 - C++

In other words, is my code going to be slower if I use smart pointers, and if so, how much slower?

Slower? Most likely not, unless you are creating a huge index using shared_ptrs and you have not enough memory to the point that your computer starts wrinkling, like an old lady being plummeted to the ground by an unbearable force from afar.

What would make your code slower is sluggish searches, unnecessary loop processing, huge copies of data, and a lot of write operations to disk (like hundreds).

The advantages of a smart pointer are all related to management. But is the overhead necessary? This depends on your implementation. Let's say you are iterating over an array of 3 phases, each phase has an array of 1024 elements. Creating a smart_ptr for this process might be overkill, since once the iteration is done you'll know you have to erase it. So you could gain extra memory from not using a smart_ptr...

But do you really want to do that?

A single memory leak could make your product have a point of failure in time (let's say your program leaks 4 megabytes each hour, it would take months to break a computer, nevertheless, it will break, you know it because the leak is there).

Is like saying "you software is guaranteed for 3 months, then, call me for service."

So in the end it really is a matter of... can you handle this risk? does using a raw pointer to handle your indexing over hundreds of different objects is worth loosing control of the memory.

If the answer is yes, then use a raw pointer.

If you don't even want to consider it, a smart_ptr is a good, viable, and awesome solution.

Solution 5 - C++

Chandler Carruth has a few surprising "discoveries" on unique_ptr in his 2019 Cppcon talk. (Youtube). I can't explain it quite as well.

I hope I understood the two main points right:

  • Code without unique_ptr will (often incorrectly) not handle cases where owership is not passed while passing a pointer. Rewriting it to use unique_ptr will add that handling, and that has some overhead.
  • A unique_ptr is still a C++ object, and objects will be passed on stack when calling a function, unlike pointers, which can be passed in registers.

Solution 6 - C++

Just for a glimpse and just for the [] operator,it is ~5X slower than the raw pointer as demonstrated in the following code, which was compiled using gcc -lstdc++ -std=c++14 -O0 and outputted this result:

malloc []:     414252610                                                 
unique []  is: 2062494135                                                
uq get []  is: 238801500                                                 
uq.get()[] is: 1505169542
new is:        241049490 

I'm beginning to learn c++, I got this in my mind: you always need to know what are you doing and take more time to know what others had done in your c++.

#EDIT As methioned by @Mohan Kumar, I provided more details. The gcc version is 7.4.0 (Ubuntu 7.4.0-1ubuntu1~14.04~ppa1), The above result was obtained when the -O0 is used, however, when I use '-O2' flag, I got this:

malloc []:     223
unique []  is: 105586217
uq get []  is: 71129461
uq.get()[] is: 69246502
new is:        9683

Then shifted to clang version 3.9.0, -O0 was :

malloc []:     409765889
unique []  is: 1351714189
uq get []  is: 256090843
uq.get()[] is: 1026846852
new is:        255421307

-O2 was:

malloc []:     150
unique []  is: 124
uq get []  is: 83
uq.get()[] is: 83
new is:        54

The result of clang -O2 is amazing.

#include <memory>
#include <iostream>
#include <chrono>
#include <thread>

uint32_t n = 100000000;
void t_m(void){
    auto a  = (char*) malloc(n*sizeof(char));
    for(uint32_t i=0; i<n; i++) a[i] = 'A';
}
void t_u(void){
    auto a = std::unique_ptr<char[]>(new char[n]);
    for(uint32_t i=0; i<n; i++) a[i] = 'A';
}

void t_u2(void){
    auto a = std::unique_ptr<char[]>(new char[n]);
    auto tmp = a.get();
    for(uint32_t i=0; i<n; i++) tmp[i] = 'A';
}
void t_u3(void){
    auto a = std::unique_ptr<char[]>(new char[n]);
    for(uint32_t i=0; i<n; i++) a.get()[i] = 'A';
}
void t_new(void){
    auto a = new char[n];
    for(uint32_t i=0; i<n; i++) a[i] = 'A';
}

int main(){
    auto start = std::chrono::high_resolution_clock::now();
    t_m();
    auto end1 = std::chrono::high_resolution_clock::now();
    t_u();
    auto end2 = std::chrono::high_resolution_clock::now();
    t_u2();
    auto end3 = std::chrono::high_resolution_clock::now();
    t_u3();
    auto end4 = std::chrono::high_resolution_clock::now();
    t_new();
    auto end5 = std::chrono::high_resolution_clock::now();
    std::cout << "malloc []:     " <<  (end1 - start).count() << std::endl;
    std::cout << "unique []  is: " << (end2 - end1).count() << std::endl;
    std::cout << "uq get []  is: " << (end3 - end2).count() << std::endl;
    std::cout << "uq.get()[] is: " << (end4 - end3).count() << std::endl;
    std::cout << "new is:        " << (end5 - end4).count() << std::endl;
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionVenemoView Question on Stackoverflow
Solution 1 - C++lisyarusView Answer on Stackoverflow
Solution 2 - C++LotharView Answer on Stackoverflow
Solution 3 - C++Cheers and hth. - AlfView Answer on Stackoverflow
Solution 4 - C++ClaudiordgzView Answer on Stackoverflow
Solution 5 - C++CaesarView Answer on Stackoverflow
Solution 6 - C++liqg3View Answer on Stackoverflow