What is the most efficient way to append one std::vector to the end of another?

C++PerformanceStlVector

C++ Problem Overview


Let v1 be the target vector, v2 needs to be appended to the back of it.

I'm now doing:

v1.reserve(v1.size() + v2.size()); 
copy(v2.begin(), v2.end(), back_inserter(v1));

Is this the most efficient way? Or can it maybe be done just via copying a chunk of memory? Thanks!

C++ Solutions


Solution 1 - C++

After a lot of arguing (and a reasonable comment from Matthieu M. and villintehaspam), I'll change my suggestion to

v1.insert( v1.end(), v2.begin(), v2.end() );

I'll keep the former suggestion here:

v1.reserve( v1.size() + v2.size() ); 
v1.insert( v1.end(), v2.begin(), v2.end() );

There are some reasons to do it the latter way, although none of them enough strong:

  • there is no guarantee on to what size will the vector be reallocated -- e.g. if the sum size is 1025, it may get reallocated to 2048 -- dependant on implementation. There is no such guarantee for reserve either, but for a specific implementation it might be true. If hunting for a bottleneck it might be rasonable to check that.
  • reserve states our intentions clear -- optimization may be more efficient in this case (reserve could prepare the cache in some top-notch implementation).
  • also, with reserve we have a C++ Standard guarantee that there will be only a single reallocation, while insert might be implemented inefficiently and do several reallocations (also something to test with a particular implementation).

Solution 2 - C++

Probably better and simpler to use a dedicated method: vector.insert

v1.insert(v1.end(), v2.begin(), v2.end());

As Michael mentions, unless the iterators are input iterators, the vector will figure out the required size and copy appended data at one go with linear complexity.

Solution 3 - C++

I simply did a quick performance measurement with the following code and

v1.insert( v1.end(), v2.begin(), v2.end() );

seems to be the right choice (as already stated above). Nevertheless, you find the reported performance below.

Test code:

#include <vector>
#include <string>

#include <boost/timer/timer.hpp>

//==============================================================================
// 
//==============================================================================

/// Returns a vector containing the sequence [ 0, ... , n-1 ].
inline std::vector<int> _range(const int n)
{
    std::vector<int> tmp(n);
    for(int i=0; i<n; i++)
        tmp[i] = i;
    return tmp;
}

void test_perf_vector_append()
{
    const vector<int> testdata1 = _range(100000000);
    const vector<int> testdata2 = _range(100000000);

    vector<int> testdata;

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  push_back()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;
        for(size_t i=0; i<testdata2.size(); i++)
        {
            testdata.push_back(testdata2[i]);
        }
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  reserve() + push_back()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;
        testdata.reserve(testdata.size() + testdata2.size());
        for(size_t i=0; i<testdata2.size(); i++)
        {
            testdata.push_back(testdata2[i]);
        }
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  insert()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;

        testdata.insert( testdata.end(), testdata2.begin(), testdata2.end() );
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  reserve() + insert()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;

        testdata.reserve( testdata.size() + testdata.size() ); 
        testdata.insert( testdata.end(), testdata2.begin(), testdata2.end() );
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  copy() + back_inserter()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;

        testdata.reserve(testdata.size() + testdata2.size()); 
        copy(testdata2.begin(), testdata2.end(), back_inserter(testdata));
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  reserve() + copy() + back_inserter()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;

        testdata.reserve(testdata.size() + testdata2.size()); 
        copy(testdata2.begin(), testdata2.end(), back_inserter(testdata));
    }
    
}

With Visual Studio 2008 SP1, x64, Release mode, /O2 /LTCG the output is as follows:

--------------------------------------------------------------
 METHOD:  push_back()
--------------------------------------------------------------
 0.933077s wall, 0.577204s user + 0.343202s system = 0.920406s CPU (98.6%)

--------------------------------------------------------------
 METHOD:  reserve() + push_back()
--------------------------------------------------------------
 0.612753s wall, 0.452403s user + 0.171601s system = 0.624004s CPU (101.8%)

--------------------------------------------------------------
 METHOD:  insert()
--------------------------------------------------------------
 0.424065s wall, 0.280802s user + 0.140401s system = 0.421203s CPU (99.3%)

--------------------------------------------------------------
 METHOD:  reserve() + insert()
--------------------------------------------------------------
 0.637081s wall, 0.421203s user + 0.218401s system = 0.639604s CPU (100.4%)

--------------------------------------------------------------
 METHOD:  copy() + back_inserter()
--------------------------------------------------------------
 0.743658s wall, 0.639604s user + 0.109201s system = 0.748805s CPU (100.7%)

--------------------------------------------------------------
 METHOD:  reserve() + copy() + back_inserter()
--------------------------------------------------------------
 0.748560s wall, 0.624004s user + 0.124801s system = 0.748805s CPU (100.0%)

Solution 4 - C++

If you happen to use Boost you can download the development version of the RangeEx library from the Boost Vault. This lib. was accepted into Boost a while ago but so far it hasn't been integrated with the main distribution. In it you'll find a new range-based algorithm which does exactly what you want:

boost::push_back(v1, v2);

Internally it works like the answer given by UncleBens, but the code is more concise and readable.

Solution 5 - C++

If you have a vector of pod-types, and you really need the performance, you could use memcpy, which ought to be faster than vector<>.insert(...):

v2.resize(v1.size() + v2.size());
memcpy((void*)&v1.front(), (void*)&v2[v1.size()], sizeof(v1.front())*v1.size());

Update: Although I would only use this if performance is really, really, needed, the code is safe for pod types.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAlex JenterView Question on Stackoverflow
Solution 1 - C++Kornel KisielewiczView Answer on Stackoverflow
Solution 2 - C++UncleBensView Answer on Stackoverflow
Solution 3 - C++StefanView Answer on Stackoverflow
Solution 4 - C++ManuelView Answer on Stackoverflow
Solution 5 - C++Viktor SehrView Answer on Stackoverflow