Using arrays or std::vectors in C++, what's the performance gap?

C++ArraysVector

C++ Problem Overview


In our C++ course they suggest not to use C++ arrays on new projects anymore. As far as I know Stroustroup himself suggests not to use arrays. But are there significant performance differences?

C++ Solutions


Solution 1 - C++

Using C++ arrays with new (that is, using dynamic arrays) should be avoided. There is the problem you have to keep track of the size, and you need to delete them manually and do all sort of housekeeping.

Using arrays on the stack is also discouraged because you don't have range checking, and passing the array around will lose any information about its size (array to pointer conversion). You should use boost::array in that case, which wraps a C++ array in a small class and provides a size function and iterators to iterate over it.

Now the std::vector vs. native C++ arrays (taken from the internet):

// Comparison of assembly code generated for basic indexing, dereferencing, 
// and increment operations on vectors and arrays/pointers.

// Assembly code was generated by gcc 4.1.0 invoked with  g++ -O3 -S  on a 
// x86_64-suse-linux machine.

#include <vector>

struct S
{
  int padding;

  std::vector<int> v;
  int * p;
  std::vector<int>::iterator i;
};

int pointer_index (S & s) { return s.p[3]; }
  // movq    32(%rdi), %rax
  // movl    12(%rax), %eax
  // ret

int vector_index (S & s) { return s.v[3]; }
  // movq    8(%rdi), %rax
  // movl    12(%rax), %eax
  // ret

// Conclusion: Indexing a vector is the same damn thing as indexing a pointer.

int pointer_deref (S & s) { return *s.p; }
  // movq    32(%rdi), %rax
  // movl    (%rax), %eax
  // ret

int iterator_deref (S & s) { return *s.i; }
  // movq    40(%rdi), %rax
  // movl    (%rax), %eax
  // ret

// Conclusion: Dereferencing a vector iterator is the same damn thing 
// as dereferencing a pointer.

void pointer_increment (S & s) { ++s.p; }
  // addq    $4, 32(%rdi)
  // ret

void iterator_increment (S & s) { ++s.i; }
  // addq    $4, 40(%rdi)
  // ret

// Conclusion: Incrementing a vector iterator is the same damn thing as 
// incrementing a pointer.

Note: If you allocate arrays with new and allocate non-class objects (like plain int) or classes without a user defined constructor and you don't want to have your elements initialized initially, using new-allocated arrays can have performance advantages because std::vector initializes all elements to default values (0 for int, for example) on construction (credits to @bernie for reminding me).

Solution 2 - C++

Preamble for micro-optimizer people

Remember:

> "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%".

(Thanks to https://stackoverflow.com/users/3454889/metamorphosis">metamorphosis</a> for the full quote)

Don't use a C array instead of a vector (or whatever) just because you believe it's faster as it is supposed to be lower-level. You would be wrong.

Use by default vector (or the safe container adapted to your need), and then if your profiler says it is a problem, see if you can optimize it, either by using a better algorithm, or changing container.

This said, we can go back to the original question.

Static/Dynamic Array?

The C++ array classes are better behaved than the low-level C array because they know a lot about themselves, and can answer questions C arrays can't. They are able to clean after themselves. And more importantly, they are usually written using templates and/or inlining, which means that what appears to a lot of code in debug resolves to little or no code produced in release build, meaning no difference with their built-in less safe competition.

All in all, it falls on two categories:

Dynamic arrays

Using a pointer to a malloc-ed/new-ed array will be at best as fast as the std::vector version, and a lot less safe (see litb's post).

So use a std::vector.

Static arrays

Using a static array will be at best:

  • as fast as the std::array version
  • and a lot less safe.

So use a std::array.

Uninitialized memory

Sometimes, using a vector instead of a raw buffer incurs a visible cost because the vector will initialize the buffer at construction, while the code it replaces didn't, as remarked bernie by in his answer.

If this is the case, then you can handle it by using a unique_ptr instead of a vector or, if the case is not exceptional in your codeline, actually write a class buffer_owner that will own that memory, and give you easy and safe access to it, including bonuses like resizing it (using realloc?), or whatever you need.

Solution 3 - C++

Vectors are arrays under the hood. The performance is the same.

One place where you can run into a performance issue, is not sizing the vector correctly to begin with.

As a vector fills, it will resize itself, and that can imply, a new array allocation, followed by n copy constructors, followed by about n destructor calls, followed by an array delete.

If your construct/destruct is expensive, you are much better off making the vector the correct size to begin with.

There is a simple way to demonstrate this. Create a simple class that shows when it is constructed/destroyed/copied/assigned. Create a vector of these things, and start pushing them on the back end of the vector. When the vector fills, there will be a cascade of activity as the vector resizes. Then try it again with the vector sized to the expected number of elements. You will see the difference.

Solution 4 - C++

To respond to something Mehrdad said:

> However, there might be cases where > you still need arrays. When > interfacing with low level code (i.e. > assembly) or old libraries that > require arrays, you might not be able > to use vectors.

Not true at all. Vectors degrade nicely into arrays/pointers if you use:

vector<double> vector;
vector.push_back(42);

double *array = &(*vector.begin());

// pass the array to whatever low-level code you have

This works for all major STL implementations. In the next standard, it will be required to work (even though it does just fine today).

Solution 5 - C++

You have even fewer reasons to use plain arrays in C++11.

There are 3 kind of arrays in nature from fastest to slowest, depending on the features they have (of course the quality of implementation can make things really fast even for case 3 in the list):

  1. Static with size known at compile time. --- std::array<T, N>
  2. Dynamic with size known at runtime and never resized. The typical optimization here is, that if the array can be allocated in the stack directly. -- Not available. Maybe dynarray in C++ TS after C++14. In C there are VLAs
  3. Dynamic and resizable at runtime. --- std::vector<T>

For 1. plain static arrays with fixed number of elements, use std::array<T, N> in C++11.

For 2. fixed size arrays specified at runtime, but that won't change their size, there is discussion in C++14 but it has been moved to a technical specification and made out of C++14 finally.

For 3. std::vector<T> will usually ask for memory in the heap. This could have performance consequences, though you could use std::vector<T, MyAlloc<T>> to improve the situation with a custom allocator. The advantage compared to T mytype[] = new MyType[n]; is that you can resize it and that it will not decay to a pointer, as plain arrays do.

Use the standard library types mentioned to avoid arrays decaying to pointers. You will save debugging time and the performance is exactly the same as with plain arrays if you use the same set of features.

Solution 6 - C++

There is definitely a performance impact to using an std::vector vs a raw array when you want an uninitialized buffer (e.g. to use as destination for memcpy()). An std::vector will initialize all its elements using the default constructor. A raw array will not.

The c++ spec for the std:vector constructor taking a count argument (it's the third form) states:

> `Constructs a new container from a variety of data sources, optionally using a user supplied allocator alloc.

> 3) Constructs the container with count default-inserted instances of T. No copies are made.

> Complexity

> 2-3) Linear in count

A raw array does not incur this initialization cost.

Note that with a custom allocator, it is possible to avoid "initialization" of the vector's elements (i.e. to use default initialization instead of value initialization). See these questions for more details:

Solution 7 - C++

Go with STL. There's no performance penalty. The algorithms are very efficient and they do a good job of handling the kinds of details that most of us would not think about.

Solution 8 - C++

STL is a heavily optimized library. In fact, it's even suggested to use STL in games where high performance might be needed. Arrays are too error prone to be used in day to day tasks. Today's compilers are also very smart and can really produce excellent code with STL. If you know what you are doing, STL can usually provide the necessary performance. For example by initializing vectors to required size (if you know from start), you can basically achieve the array performance. However, there might be cases where you still need arrays. When interfacing with low level code (i.e. assembly) or old libraries that require arrays, you might not be able to use vectors.

Solution 9 - C++

About duli's contribution with my own measurements.

The conclusion is that arrays of integers are faster than vectors of integers (5 times in my example). However, arrays and vectors are arround the same speed for more complex / not aligned data.

Solution 10 - C++

If you compile the software in debug mode, many compilers will not inline the accessor functions of the vector. This will make the stl vector implementation much slower in circumstances where performance is an issue. It will also make the code easier to debug since you can see in the debugger how much memory was allocated.

In optimized mode, I would expect the stl vector to approach the efficiency of an array. This is since many of the vector methods are now inlined.

Solution 11 - C++

The performance difference between the two is very much implementation dependent - if you compare a badly implemented std::vector to an optimal array implementation, the array would win, but turn it around and the vector would win...

As long as you compare apples with apples (either both the array and the vector have a fixed number of elements, or both get resized dynamically) I would think that the performance difference is negligible as long as you follow got STL coding practise. Don't forget that using standard C++ containers also allows you to make use of the pre-rolled algorithms that are part of the standard C++ library and most of them are likely to be better performing than the average implementation of the same algorithm you build yourself.

That said, IMHO the vector wins in a debug scenario with a debug STL as most STL implementations with a proper debug mode can at least highlight/cathc the typical mistakes made by people when working with standard containers.

Oh, and don't forget that the array and the vector share the same memory layout so you can use vectors to pass data to legacy C or C++ code that expects basic arrays. Keep in mind that most bets are off in that scenario, though, and you're dealing with raw memory again.

Solution 12 - C++

If you're using vectors to represent multi-dimensional behavior, there is a performance hit.

https://stackoverflow.com/questions/54973981/do-2d-vectors-cause-a-performance-hit?noredirect=1#comment96707133_54973981

The gist is that there's a small amount of overhead with each sub-vector having size information, and there will not necessarily be serialization of data (as there is with multi-dimensional c arrays). This lack of serialization can offer greater than micro optimization opportunities. If you're doing multi-dimensional arrays, it may be best to just extend std::vector and roll your own get/set/resize bits function.

Solution 13 - C++

If you do not need to dynamically adjust the size, you have the memory overhead of saving the capacity (one pointer/size_t). That's it.

Solution 14 - C++

There might be some edge case where you have a vector access inside an inline function inside an inline function, where you've gone beyond what the compiler will inline and it will force a function call. That would be so rare as to not be worth worrying about - in general I would agree with litb.

I'm surprised nobody has mentioned this yet - don't worry about performance until it has been proven to be a problem, then benchmark.

Solution 15 - C++

Sometimes arrays are indeed better than vectors. If you are always manipulating a fixed length set of objects, arrays are better. Consider the following code snippets:

int main() {
int v[3];
v[0]=1; v[1]=2;v[2]=3;
int sum;
int starttime=time(NULL);
cout << starttime << endl;
for (int i=0;i<50000;i++)
for (int j=0;j<10000;j++) {
X x(v);
sum+=x.first();
}
int endtime=time(NULL);
cout << endtime << endl;
cout << endtime - starttime << endl;

}

where the vector version of X is

class X {
vector<int> vec;
public:
X(const vector<int>& v) {vec = v;}
int first() { return vec[0];}
};

and the array version of X is:

class X {
int f[3];

public:
X(int a[]) {f[0]=a[0]; f[1]=a[1];f[2]=a[2];}
int first() { return f[0];}
};

The array version will of main() will be faster because we are avoiding the overhead of "new" everytime in the inner loop.

(This code was posted to comp.lang.c++ by me).

Solution 16 - C++

I'd argue that the primary concern isn't performance, but safety. You can make a lot of mistakes with arrays (consider resizing, for example), where a vector would save you a lot of pain.

Solution 17 - C++

Vectors use a tiny bit more memory than arrays since they contain the size of the array. They also increase the hard disk size of programs and probably the memory footprint of programs. These increases are tiny, but may matter if you're working with an embedded system. Though most places where these differences matter are places where you would use C rather than C++.

Solution 18 - C++

The following simple test:

https://stackoverflow.com/questions/10887668/c-array-vs-vector-performance-test-explanation

contradicts the conclusions from "Comparison of assembly code generated for basic indexing, dereferencing, and increment operations on vectors and arrays/pointers."

There must be a difference between the arrays and vectors. The test says so... just try it, the code is there...

Solution 19 - C++

For fixed-length arrays the performance is the same (vs. vector<>) in release build, but in debug build low-level arrays win by a factor of 20 in my experience (MS Visual Studio 2015, C++ 11).

So the "save time debugging" argument in favor of STL might be valid if you (or your coworkers) tend to introduce bugs in your array usage, but maybe not if your debugging time is mostly waiting on your code to run to the point you are currently working on so that you can step through it.

Experienced developers working on numerically intensive code sometimes fall into the second group (especially if they use vector :) ).

Solution 20 - C++

Assuming a fixed-length array (e.g. int* v = new int[1000]; vs std::vector<int> v(1000);, with the size of v being kept fixed at 1000), the only performance consideration that really matters (or at least mattered to me when I was in a similar dilemma) is the speed of access to an element. I looked up the STL's vector code, and here is what I found:

const_reference
operator[](size_type __n) const
{ return *(this->_M_impl._M_start + __n); }

This function will most certainly be inlined by the compiler. So, as long as the only thing that you plan to do with v is access its elements with operator[], it seems like there shouldn't really be any difference in performance.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiontunnuzView Question on Stackoverflow
Solution 1 - C++Johannes Schaub - litbView Answer on Stackoverflow
Solution 2 - C++paercebalView Answer on Stackoverflow
Solution 3 - C++EvilTeachView Answer on Stackoverflow
Solution 4 - C++Frank KruegerView Answer on Stackoverflow
Solution 5 - C++Germán DiagoView Answer on Stackoverflow
Solution 6 - C++bernieView Answer on Stackoverflow
Solution 7 - C++John D. CookView Answer on Stackoverflow
Solution 8 - C++mmxView Answer on Stackoverflow
Solution 9 - C++lalebardeView Answer on Stackoverflow
Solution 10 - C++JuanView Answer on Stackoverflow
Solution 11 - C++Timo GeuschView Answer on Stackoverflow
Solution 12 - C++Seph ReedView Answer on Stackoverflow
Solution 13 - C++Greg RogersView Answer on Stackoverflow
Solution 14 - C++Mark RansomView Answer on Stackoverflow
Solution 15 - C++duliView Answer on Stackoverflow
Solution 16 - C++Gabriel IsenbergView Answer on Stackoverflow
Solution 17 - C++BrianView Answer on Stackoverflow
Solution 18 - C++Hamed100101View Answer on Stackoverflow
Solution 19 - C++lhogView Answer on Stackoverflow
Solution 20 - C++Subh_bView Answer on Stackoverflow