Why would I std::move an std::shared_ptr?

C++C++11Shared PtrSmart PointersMove Semantics

C++ Problem Overview


I have been looking through the Clang source code and I found this snippet:

void CompilerInstance::setInvocation(
    std::shared_ptr<CompilerInvocation> Value) {
  Invocation = std::move(Value);
}

Why would I want to std::move an std::shared_ptr?

Is there any point transferring ownership on a shared resource?

Why wouldn't I just do this instead?

void CompilerInstance::setInvocation(
    std::shared_ptr<CompilerInvocation> Value) {
  Invocation = Value;
}

C++ Solutions


Solution 1 - C++

I think that the one thing the other answers did not emphasize enough is the point of speed.

std::shared_ptr reference count is atomic. increasing or decreasing the reference count requires atomic increment or decrement. This is hundred times slower than non-atomic increment/decrement, not to mention that if we increment and decrement the same counter we wind up with the exact number, wasting a ton of time and resources in the process.

By moving the shared_ptr instead of copying it, we "steal" the atomic reference count and we nullify the other shared_ptr. "stealing" the reference count is not atomic, and it is hundred times faster than copying the shared_ptr (and causing atomic reference increment or decrement).

Do note that this technique is used purely for optimization. copying it (as you suggested) is just as fine functionality-wise.

Solution 2 - C++

By using move you avoid increasing, and then immediately decreasing, the number of shares. That might save you some expensive atomic operations on the use count.

Solution 3 - C++

Move operations (like move constructor) for std::shared_ptr are cheap, as they basically are "stealing pointers" (from source to destination; to be more precise, the whole state control block is "stolen" from source to destination, including the reference count information).

Instead copy operations on std::shared_ptr invoke atomic reference count increase (i.e. not just ++RefCount on an integer RefCount data member, but e.g. calling InterlockedIncrement on Windows), which is more expensive than just stealing pointers/state.

So, analyzing the ref count dynamics of this case in details:

// shared_ptr<CompilerInvocation> sp;
compilerInstance.setInvocation(sp);

If you pass sp by value and then take a copy inside the CompilerInstance::setInvocation method, you have:

  1. When entering the method, the shared_ptr parameter is copy constructed: ref count atomic increment.
  2. Inside the method's body, you copy the shared_ptr parameter into the data member: ref count atomic increment.
  3. When exiting the method, the shared_ptr parameter is destructed: ref count atomic decrement.

You have two atomic increments and one atomic decrement, for a total of three atomic operations.

Instead, if you pass the shared_ptr parameter by value and then std::move inside the method (as properly done in Clang's code), you have:

  1. When entering the method, the shared_ptr parameter is copy constructed: ref count atomic increment.
  2. Inside the method's body, you std::move the shared_ptr parameter into the data member: ref count does not change! You are just stealing pointers/state: no expensive atomic ref count operations are involved.
  3. When exiting the method, the shared_ptr parameter is destructed; but since you moved in step 2, there's nothing to destruct, as the shared_ptr parameter is not pointing to anything anymore. Again, no atomic decrement happens in this case.

Bottom line: in this case you get just one ref count atomic increment, i.e. just one atomic operation.
As you can see, this is much better than two atomic increments plus one atomic decrement (for a total of three atomic operations) for the copy case.

Solution 4 - C++

There are two reasons for using std::move in this situation. Most responses addressed the issue of speed, but ignored the important issue of showing the code's intent more clearly.

For a std::shared_ptr, std::move unambiguously denotes a transfer of ownership of the pointee, while a simple copy operation adds an additional owner. Of course, if the original owner subsequently relinquishes their ownership (such as by allowing their std::shared_ptr to be destroyed), then a transfer of ownership has been accomplished.

When you transfer ownership with std::move, it's obvious what is happening. If you use a normal copy, it isn't obvious that the intended operation is a transfer until you verify that the original owner immediately relinquishes ownership. As a bonus, a more efficient implementation is possible, since an atomic transfer of ownership can avoid the temporary state where the number of owners has increased by one (and the attendant changes in reference counts).

Solution 5 - C++

Copying a shared_ptr involves copying its internal state object pointer and changing the reference count. Moving it only involves swapping pointers to the internal reference counter, and the owned object, so it's faster.

Solution 6 - C++

Since none of these answers offered an actual benchmark, I thought I'd try to provide one. However, think I've left myself more confused than when I started. I tried to come up with a test that would measure passing a shared_ptr<int> by value, by reference, and using std::move, performing an add operation on that value, and returning the result. I did this several times (one million) using two sets of tests. The first set added a constant value to the shared_ptr<int>, the other added a random value in the [0, 10] range. I figured the constant value addition would be a candidate for heavy optimization, whereas the random value test would not. That is more-or-less what I saw, but the extreme differences in execution time leads me to believe that other factors/problems with this test program are the contributing factors to the execution time differences, not the move semantics.

tl;dr

For no optimizations (-O0), constant addition

  • std::move was ~4x faster than pass-by-value
  • std::move was marginally slower than pass-by-reference

For high optimizations (-O3), constant addition

  • std::move was 70-90 thousand times faster than pass-by-value
  • std::move was marginally faster than pass-by-reference (anywhere from 1-1.4 times)

For no optimizations (-O0), random addition

  • std::move was 1-2 times faster than pass-by-value
  • std::move was marginally slower than pass-by-reference

For high optimizations (-O3), random addition

  • std::move was 1-1.3 times faster than pass-by-value (marginally worse than no optimizations)
  • std::move was essentially the same as pass-by-reference

Finally, the test

#include <memory>
#include <iostream>
#include <chrono>
#include <ctime>
#include <random>

constexpr auto MAX_NUM_ITS = 1000000;

// using random values to try to cut down on massive compiler optimizations
static std::random_device RAND_DEV;
static std::mt19937 RNG(RAND_DEV());
static std::uniform_int_distribution<std::mt19937::result_type> DIST11(0,10);

void CopyPtr(std::shared_ptr<int> myInt)
{
    // demonstrates that use_count increases with each copy
    std::cout << "In CopyPtr: ref count = " << myInt.use_count() << std::endl;
    std::shared_ptr<int> myCopyInt(myInt);
    std::cout << "In CopyPtr: ref count = " << myCopyInt.use_count() << std::endl;
}

void ReferencePtr(std::shared_ptr<int>& myInt)
{
    // reference count stays the same until a copy is made
    std::cout << "In ReferencePtr: ref count = " << myInt.use_count() << std::endl;
    std::shared_ptr<int> myCopyInt(myInt);
    std::cout << "In ReferencePtr: ref count = " << myCopyInt.use_count() << std::endl;
}

void MovePtr(std::shared_ptr<int>&& myInt)
{
    // demonstrates that use_count remains constant with each move
    std::cout << "In MovePtr: ref count = " << myInt.use_count() << std::endl;
    std::shared_ptr<int> myMovedInt(std::move(myInt));
    std::cout << "In MovePtr: ref count = " << myMovedInt.use_count() << std::endl;
}

int CopyPtrFastConst(std::shared_ptr<int> myInt)
{
    return 5 + *myInt;
}

int ReferencePtrFastConst(std::shared_ptr<int>& myInt)
{
    return 5 + *myInt;
}

int MovePtrFastConst(std::shared_ptr<int>&& myInt)
{
    return 5 + *myInt;
}

int CopyPtrFastRand(std::shared_ptr<int> myInt)
{
    return DIST11(RNG) + *myInt;
}

int ReferencePtrFastRand(std::shared_ptr<int>& myInt)
{
    return DIST11(RNG) + *myInt;
}

int MovePtrFastRand(std::shared_ptr<int>&& myInt)
{
    return DIST11(RNG) + *myInt;
}

void RunConstantFunctions(std::shared_ptr<int> myInt)
{
    std::cout << "\nIn constant funcs, ref count = " << myInt.use_count() << std::endl;
    // demonstrates speed of each function
    int sum = 0;

    // Copy pointer
    auto start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
    {
        sum += CopyPtrFastConst(myInt);
    }
    auto end = std::chrono::steady_clock::now();
    std::chrono::duration<double> copyElapsed = end - start;
    std::cout << "CopyPtrConst sum = " << sum << ", took " << copyElapsed.count() << " seconds.\n";

    // pass pointer by reference
    sum = 0;
    start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
    {
        sum += ReferencePtrFastConst(myInt);
    }
    end = std::chrono::steady_clock::now();
    std::chrono::duration<double> refElapsed = end - start;
    std::cout << "ReferencePtrConst sum = " << sum << ", took " << refElapsed.count() << " seconds.\n";

    // pass pointer using std::move
    sum = 0;
    start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
    {
        sum += MovePtrFastConst(std::move(myInt));
    }
    end = std::chrono::steady_clock::now();
    std::chrono::duration<double> moveElapsed = end - start;
    std::cout << "MovePtrConst sum = " << sum << ", took " << moveElapsed.count() <<
        " seconds.\n";

    std::cout << "std::move vs pass by value: " << copyElapsed / moveElapsed << " times faster.\n";
    std::cout << "std::move vs pass by ref:   " << refElapsed / moveElapsed << " times faster.\n";
}

void RunRandomFunctions(std::shared_ptr<int> myInt)
{
    std::cout << "\nIn random funcs, ref count = " << myInt.use_count() << std::endl;
    // demonstrates speed of each function
    int sum = 0;

    // Copy pointer
    auto start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
    {
        sum += CopyPtrFastRand(myInt);
    }
    auto end = std::chrono::steady_clock::now();
    std::chrono::duration<double> copyElapsed = end - start;
    std::cout << "CopyPtrRand sum = " << sum << ", took " << copyElapsed.count() << " seconds.\n";

    // pass pointer by reference
    sum = 0;
    start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
    {
        sum += ReferencePtrFastRand(myInt);
    }
    end = std::chrono::steady_clock::now();
    std::chrono::duration<double> refElapsed = end - start;
    std::cout << "ReferencePtrRand sum = " << sum << ", took " << refElapsed.count() << " seconds.\n";

    // pass pointer using std::move
    sum = 0;
    start = std::chrono::steady_clock::now();
    for (auto i=0; i<MAX_NUM_ITS; i++)
    {
        sum += MovePtrFastRand(std::move(myInt));
    }
    end = std::chrono::steady_clock::now();
    std::chrono::duration<double> moveElapsed = end - start;
    std::cout << "MovePtrRand sum = " << sum << ", took " << moveElapsed.count() <<
        " seconds.\n";

    std::cout << "std::move vs pass by value: " << copyElapsed / moveElapsed << " times faster.\n";
    std::cout << "std::move vs pass by ref:   " << refElapsed / moveElapsed << " times faster.\n";
}

int main()
{
    // demonstrates how use counts are effected between copy and move
    std::shared_ptr<int> myInt = std::make_shared<int>(5);
    std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
    CopyPtr(myInt);
    std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
    ReferencePtr(myInt);
    std::cout << "In main: ref count = " << myInt.use_count() << std::endl;
    MovePtr(std::move(myInt));
    std::cout << "In main: ref count = " << myInt.use_count() << std::endl;

    // since myInt was moved to MovePtr and fell out of scope on return (was destroyed),
    // we have to reinitialize myInt
    myInt.reset();
    myInt = std::make_shared<int>(5);

    RunConstantFunctions(myInt);
    RunRandomFunctions(myInt);

    return 0;
}

live version here

I noticed that for -O0 and -O3, the constant functions both compiled to the same assembly for both sets of flags, both relatively short blocks. This makes me think a majority of the optimization comes from the calling code, but I'm not really seeing that in my amateur assembly knowledge.

The random functions compiled to quite a bit of assembly, even for -O3, so the random part must be dominating that routine.

So in the end, not really sure what to make of this. Please throw darts at it, tell me what I did wrong, offer some explanations.

Solution 7 - C++

At least with libstdc++ you should get the same performance with move and assignment because operator= calls std::move on the incoming pointer. See: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/shared_ptr.h#L384

Solution 8 - C++

Unfortunately I did not read @yano's anwer. So I did my own benchmark. Sad that nobody tried to verify the hypotheses around here. My results were the similar to yanos, in the sense that the improvement is far away from hundreds of times.

On my Macbook Air move is three times faster (g++ as well as clang++ -std=c++17 -O3 -DNDEBUG). Let me know if you see problems with the benchmark.

#include <chrono>
#include <iostream>
#include <vector>
#include <memory>
using namespace std;
using namespace std::chrono;


int COUNT = 50'000'000;

struct TimeIt
{
    system_clock::time_point start;
    TimeIt() {
        start = system_clock::now();
    }
    ~TimeIt() {
        auto runtime = duration_cast<milliseconds>(system_clock::now()-start).count();
        cout << runtime << " ms" << endl;
    }

};

void benchmark_copy(const vector<shared_ptr<int>> &vec_src)
{
    cout << "benchmark_copy" << endl;
    vector<shared_ptr<int>> vec_dst;
    vec_dst.reserve(COUNT);
    TimeIt ti;
    for(auto &sp : vec_src)
        vec_dst.emplace_back(sp);
}

void benchmark_move(vector<shared_ptr<int>> &&vec_src)
{
    cout << "benchmark_move" << endl;
    vector<shared_ptr<int>> vec_dst;
    vec_dst.reserve(COUNT);
    TimeIt ti;
    for(auto &sp : vec_src)
        vec_dst.emplace_back(move(sp));

}

int main (int arg, char **argv){

    vector<shared_ptr<int>> vec;
    for (int i = 0; i < COUNT; ++i)
        vec.emplace_back(new int);

    benchmark_copy(vec);
    benchmark_move(move(vec));

}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsdgfsdhView Question on Stackoverflow
Solution 1 - C++David HaimView Answer on Stackoverflow
Solution 2 - C++Bo PerssonView Answer on Stackoverflow
Solution 3 - C++Mr.C64View Answer on Stackoverflow
Solution 4 - C++Stephen C. SteelView Answer on Stackoverflow
Solution 5 - C++SingerOfTheFallView Answer on Stackoverflow
Solution 6 - C++yanoView Answer on Stackoverflow
Solution 7 - C++A. K.View Answer on Stackoverflow
Solution 8 - C++ManuelSchneid3rView Answer on Stackoverflow