Vector: initialization or reserve?

C++Vector

C++ Problem Overview


I know the size of a vector, which is the best way to initialize it?

Option 1:

vector<int> vec(3); //in .h
vec.at(0)=var1;     //in .cpp
vec.at(1)=var2;     //in .cpp
vec.at(2)=var3;     //in .cpp

Option 2:

vector<int> vec;     //in .h
vec.reserve(3);      //in .cpp
vec.push_back(var1); //in .cpp
vec.push_back(var2); //in .cpp
vec.push_back(var3); //in .cpp

I guess, Option2 is better than Option1. Is it? Any other options?

C++ Solutions


Solution 1 - C++

Both variants have different semantics, i.e. you are comparing apples and oranges.

The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.

Choose what better fits your needs, i.e. what is "better" in a certain situation.

Solution 2 - C++

The "best" way would be:

vector<int> vec = {var1, var2, var3};

available with a C++11 capable compiler.

Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.

Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back and stop worrying about getting the indexes right.

In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.


* From Stroustrups C++ Style and Technique FAQ:

> People sometimes worry about the cost of std::vector growing > incrementally. I used to worry about that and used reserve() to > optimize the growth. After measuring my code and repeatedly having > trouble finding the performance benefits of reserve() in real > programs, I stopped using it except where it is needed to avoid > iterator invalidation (a rare case in my code). Again: measure before > you optimize.

Solution 3 - C++

Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.

For a simple rule to follow:

Option #1 is faster... enter image description here enter image description here

...but this probably shouldn't be your biggest concern.

Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3): enter image description here

So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.


These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.

main.cpp:

/* 
 * Test constructing and filling a vector in two ways: construction with size
 * then assignment versus construction of empty vector followed by push_back
 * We collect dummy sums to prevent the compiler from optimizing out computation
 */

#include <iostream>
#include <vector>

#include "rng.hpp"
#include "timer.hpp"

const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;

int main() {
  for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
       mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
    // Generate sizes from normal distribution
    std::vector<size_t> sizes_vec;
    NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0); 
    for (int i = 0; i < kNumVecs; ++i) {
      sizes_vec.push_back(sizes_rng.GenerateValue());
    }
    Timer timer;
    UniformIntRng<int> values_rng(0, 5);
    // Method 1: construct with size, then assign
    timer.Reset();
    int method_1_sum = 0;
    for (size_t num_els : sizes_vec) {
      std::vector<int> vec(num_els);
      for (size_t i = 0; i < num_els; ++i) {
        vec[i] = values_rng.GenerateValue();
      }
      // Compute sum - this part identical for two methods
      for (size_t i = 0; i < num_els; ++i) {
        method_1_sum += vec[i];
      }
    }
    double method_1_seconds = timer.GetSeconds();
    // Method 2: reserve then push_back
    timer.Reset();
    int method_2_sum = 0;
    for (size_t num_els : sizes_vec) {
      std::vector<int> vec;
      vec.reserve(num_els);
      for (size_t i = 0; i < num_els; ++i) {
        vec.push_back(values_rng.GenerateValue());
      }
      // Compute sum - this part identical for two methods
      for (size_t i = 0; i < num_els; ++i) {
        method_2_sum += vec[i];
      }
    }
    double method_2_seconds = timer.GetSeconds();
    // Report results as mean_size, method_1_seconds, method_2_seconds
    std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
    // Do something with the dummy sums that cannot be optimized out
    std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
  }
  
  return 0;
}

The header files I used are located here:

Solution 4 - C++

While your examples are essentially the same, it may be that when the type used is not an int the choice is taken from you. If your type doesn't have a default constructor, or if you'll have to re-construct each element later anyway, I would use reserve. Just don't fall into the trap I did and use reserve and then the operator[] for initialisation!


Constructor

std::vector<MyType> myVec(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();

In this first case, using the constructor, size and numberOfElementsToStart will be equal and capacity will be greater than or equal to them.

Think of myVec as a vector containing a number of items of MyType which can be accessed and modified, push_back(anotherInstanceOfMyType) will append it the the end of the vector.


Reserve

std::vector<MyType> myVec;
myVec.reserve(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();

When using the reserve function, size will be 0 until you add an element to the array and capacity will be equal to or greater than numberOfElementsToStart.

Think of myVec as an empty vector which can have new items appended to it using push_back with no memory allocation for at least the first numberOfElementsToStart elements.

Note that push_back() still requires an internal check to ensure that size < capacity and to increment size, so you may want to weigh this against the cost of default construction.


List initialisation

std::vector<MyType> myVec{ var1, var2, var3 };

This is an additional option for initialising your vector, and while it is only feasible for very small vectors, it is a clear way to initialise a small vector with known values. size will be equal to the number of elements you initialised it with, and capacity will be equal to or greater than size. Modern compilers may optimise away the creation of temporary objects and prevent unnecessary copying.

Solution 5 - C++

Option 2 is better, as reserve only needs to reserve memory (3 * sizeof(T)), while the first option calls the constructor of the base type for each cell inside the container.

For C-like types it will probably be the same.

Solution 6 - C++

How it Works

This is implementation specific however in general Vector data structure internally will have pointer to the memory block where the elements would actually resides. Both GCC and VC++ allocate for 0 elements by default. So you can think of Vector's internal memory pointer to be nullptr by default.

When you call vector<int> vec(N); as in your Option 1, the N objects are created using default constructor. This is called fill constructor.

When you do vec.reserve(N); after default constructor as in Option 2, you get data block to hold 3 elements but no objects are created unlike in option 1.

Why to Select Option 1

If you know the number of elements vector will hold and you might leave most of the elements to its default values then you might want to use this option.

Why to Select Option 2

This option is generally better of the two as it only allocates data block for the future use and not actually filling up with objects created from default constructor.

Solution 7 - C++

In the long run, it depends on the usage and numbers of the elements.

Run the program below to understand how the compiler reserves space:

> vector vec; > for(int i=0; i<50; i++) > { > cout << "size=" << vec.size() << "capacity=" << vec.capacity() << endl; > vec.push_back(i); > }

size is the number of actual elements and capacity is the actual size of the array to imlement vector. In my computer, till 10, both are the same. But, when size is 43 the capacity is 63. depending on the number of elements, either may be better. For example, increasing the capacity may be expensive.

Solution 8 - C++

Another option is to Trust Your Compiler(tm) and do the push_backs without calling reserve first. It has to allocate some space when you start adding elements. Perhaps it does that just as well as you would?

It is "better" to have simpler code that does the same job.

Solution 9 - C++

Since it seems 5 years have passed and a wrong answer is still the accepted one, and the most-upvoted answer is completely useless (missed the forest for the trees), I will add a real response.

Method #1: we pass an initial size parameter into the vector, let's call it n. That means the vector is filled with n elements, which will be initialized to their default value. For example, if the vector holds ints, it will be filled with n zeros.

Method #2: we first create an empty vector. Then we reserve space for n elements. In this case, we never create the n elements and thus we never perform any initialization of the elements in the vector. Since we plan to overwrite the values of every element immediately, the lack of initialization will do us no harm. On the other hand, since we have done less overall, this would be the better* option.

* better - real definition: never worse. It's always possible a smart compiler will figure out what you're trying to do and optimize it for you.


Conclusion: use method #2.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAleView Question on Stackoverflow
Solution 1 - C++Sebastian MachView Answer on Stackoverflow
Solution 2 - C++UncleBensView Answer on Stackoverflow
Solution 3 - C++Apollys supports MonicaView Answer on Stackoverflow
Solution 4 - C++TroysephView Answer on Stackoverflow
Solution 5 - C++nobView Answer on Stackoverflow
Solution 6 - C++Shital ShahView Answer on Stackoverflow
Solution 7 - C++haberdarView Answer on Stackoverflow
Solution 8 - C++Bo PerssonView Answer on Stackoverflow
Solution 9 - C++Apollys supports MonicaView Answer on Stackoverflow