Is make_shared really more efficient than new?

C++Shared PtrClangLibc++Make Shared

C++ Problem Overview


I was experimenting with shared_ptr and make_shared from C++11 and programmed a little toy example to see what is actually happening when calling make_shared. As infrastructure I was using llvm/clang 3.0 along with the llvm std c++ library within XCode4.

class Object
{
public:
    Object(const string& str)
    {
        cout << "Constructor " << str << endl;
    }
    
    Object()
    {
        cout << "Default constructor" << endl;
        
    }
    
    ~Object()
    {
        cout << "Destructor" << endl;
    }
    
    Object(const Object& rhs)
    {
        cout << "Copy constructor..." << endl;
    }
};

void make_shared_example()
{
    cout << "Create smart_ptr using make_shared..." << endl;
    auto ptr_res1 = make_shared<Object>("make_shared");
    cout << "Create smart_ptr using make_shared: done." << endl;
    
    cout << "Create smart_ptr using new..." << endl;
    shared_ptr<Object> ptr_res2(new Object("new"));
    cout << "Create smart_ptr using new: done." << endl;
}

Now have a look at the output, please:

> Create smart_ptr using make_shared... > > Constructor make_shared > > Copy constructor... > > Copy constructor... > > Destructor > > Destructor > > Create smart_ptr using make_shared: done. > > Create smart_ptr using new... > > Constructor new > > Create smart_ptr using new: done. > > Destructor > > Destructor

It appears that make_shared is calling the copy constructor two times. If I allocate memory for an Object using a regular new this does not happen, only one Object is constructed.

What I am wondering about is the following. I heard that make_shared is supposed to be more efficient than using new(1, 2). One reason is because make_shared allocates the reference count together with the object to be managed in the same block of memory. OK, I got the point. This is of course more efficient than two separate allocation operations.

On the contrary I don't understand why this has to come with the cost of two calls to the copy constructor of Object. Because of this I am not convinced that make_shared is more efficient than allocation using new in every case. Am I wrong here? Well OK, One could implement a move constructor for Object but still I am not sure whether this this is more efficient than just allocating Object through new. At least not in every case. It would be true if copying Object is less expensive than allocating memory for a reference counter. But the shared_ptr-internal reference counter could be implemented using a couple of primitive data types, right?

Can you help and explain why make_shared is the way to go in terms of efficiency, despite the outlined copy overhead?

C++ Solutions


Solution 1 - C++

> As infrastructure I was using llvm/clang 3.0 along with the llvm std c++ library within XCode4.

Well that appears to be your problem. The C++11 standard states the following requirements for make_shared<T> (and allocate_shared<T>), in section 20.7.2.2.6:

> Requires: The expression ::new (pv) T(std::forward(args)...), where pv has type void* and points to storage suitable to hold an object of type T, shall be well formed. A shall be an allocator (17.6.3.5). The copy constructor and destructor of A shall not throw exceptions.

T is not required to be copy-constructable. Indeed, T isn't even required to be non-placement-new constructable. It is only required to be constructable in-place. This means that the only thing that make_shared<T> can do with T is new it in-place.

So the results you get are not consistent with the standard. LLVM's libc++ is broken in this regard. File a bug report.

For reference, here's what happened when I took your code into VC2010:

Create smart_ptr using make_shared...
Constructor make_shared
Create smart_ptr using make_shared: done.
Create smart_ptr using new...
Constructor new
Create smart_ptr using new: done.
Destructor
Destructor

I also ported it to Boost's original shared_ptr and make_shared, and I got the same thing as VC2010.

I'd suggest filing a bug report, as libc++'s behavior is broken.

Solution 2 - C++

You have to compare these two versions:

std::shared_ptr<Object> p1 = std::make_shared<Object>("foo");
std::shared_ptr<Object> p2(new Object("foo"));

In your code, the second variable is just a naked pointer, not a shared pointer at all.


Now on the meat. make_shared is (in practice) more efficient, because it allocates the reference control block together with the actual object in one single dynamic allocation. By contrast, the constructor for shared_ptr that takes a naked object pointer must allocate another dynamic variable for the reference count. The trade-off is that make_shared (or its cousin allocate_shared) does not allow you to specify a custom deleter, since the allocation is performed by the allocator.

(This does not affect the construction of the object itself. From Object's perspective there is no difference between the two versions. What's more efficient is the shared pointer itself, not the managed object.)

Solution 3 - C++

So one thing to keep in mind is your optimization settings. Measuring performance, particularly with regard to c++ is meaningless without optimizations enabled. I don't know if you did in fact compile with optimizations, so I thought it was worth mentioning.

That said, what you are measuring with this test is not a way that make_shared is more efficient. Simply put, you are measuring the wrong thing :-P.

Here's the deal. Normally, when you create shared pointer, it has at least 2 data members (possibly more). One for the pointer, and one for the reference count. This reference count is allocated on the heap (so that it can be shared among shared_ptr with different lifetimes...that's the point after all!)

So if you are creating an object with something like std::shared_ptr<Object> p2(new Object("foo")); There are at least 2 calls to new. One for Object and one for the reference count object.

make_shared has the option (i'm not sure it has to), to do a single new which is big enough to hold the object pointed to and the reference count in the same contiguous block. Effectively allocating an object that looks something like this (illustrative, not literally what it is).

struct T {
    int reference_count;
    Object object;
};

Since the reference count and the object's lifetimes are tied together (it doesn't make sense for one to live longer than the other). This whole block can be deleted at the same time as well.

So the efficiency is in allocations, not in copying (which I suspect had to do with optimization more than anything else).

To be clear, this is what boost has to say on about make_shared

http://www.boost.org/doc/libs/1_43_0/libs/smart_ptr/make_shared.html > Besides convenience and style, such a function is also exception safe > and considerably faster because it can use a single allocation for > both the object and its corresponding control block, eliminating a > significant portion of shared_ptr's construction overhead. This > eliminates one of the major efficiency complaints about shared_ptr.

Solution 4 - C++

You should not be getting any extra copies there. The output should be:

Create smart_ptr using make_shared...
Constructor make_shared
Create smart_ptr using make_shared: done.
Create smart_ptr using new...
Constructor new
Create smart_ptr using new: done.
Destructor

I don't know why you're getting extra copies. (though I see you're getting one 'Destructor' too many, so the code you used to get your output must be different from the code you posted)

make_shared is more efficient because it can be implemented using only one dynamic allocation instead of two, and because it needs one pointer's worth of memory less book-keeping per shared object.

Edit: I didn't check with Xcode 4.2 but with Xcode 4.3 I get the correct output I show above, not the incorrect output shown in the question.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1212354View Question on Stackoverflow
Solution 1 - C++Nicol BolasView Answer on Stackoverflow
Solution 2 - C++Kerrek SBView Answer on Stackoverflow
Solution 3 - C++Evan TeranView Answer on Stackoverflow
Solution 4 - C++bames53View Answer on Stackoverflow