Efficient string concatenation in C++

C++PerformanceStringConcatenation

C++ Problem Overview


I heard a few people expressing worries about "+" operator in std::string and various workarounds to speed up concatenation. Are any of these really necessary? If so, what is the best way to concatenate strings in C++?

C++ Solutions


Solution 1 - C++

The extra work is probably not worth it, unless you really really need efficiency. You probably will have much better efficiency simply by using operator += instead.

Now after that disclaimer, I will answer your actual question...

The efficiency of the STL string class depends on the implementation of STL you are using.

You could guarantee efficiency and have greater control yourself by doing concatenation manually via c built-in functions.

Why operator+ is not efficient:

Take a look at this interface:

template <class charT, class traits, class Alloc>
basic_string<charT, traits, Alloc>
operator+(const basic_string<charT, traits, Alloc>& s1,
          const basic_string<charT, traits, Alloc>& s2)

You can see that a new object is returned after each +. That means that a new buffer is used each time. If you are doing a ton of extra + operations it is not efficient.

Why you can make it more efficient:

  • You are guaranteeing efficiency instead of trusting a delegate to do it efficiently for you
  • the std::string class knows nothing about the max size of your string, nor how often you will be concatenating to it. You may have this knowledge and can do things based on having this information. This will lead to less re-allocations.
  • You will be controlling the buffers manually so you can be sure that you won't copy the whole string into new buffers when you don't want that to happen.
  • You can use the stack for your buffers instead of the heap which is much more efficient.
  • string + operator will create a new string object and return it hence using a new buffer.

Considerations for implementation:

  • Keep track of the string length.
  • Keep a pointer to the end of the string and the start, or just the start and use the start + the length as an offset to find the end of the string.
  • Make sure the buffer you are storing your string in, is big enough so you don't need to re-allocate data
  • Use strcpy instead of strcat so you don't need to iterate over the length of the string to find the end of the string.

Rope data structure:

If you need really fast concatenations consider using a rope data structure.

Solution 2 - C++

Reserve your final space before, then use the append method with a buffer. For example, say you expect your final string length to be 1 million characters:

std::string s;
s.reserve(1000000);

while (whatever)
{
  s.append(buf,len);
}

Solution 3 - C++

I would not worry about it. If you do it in a loop, strings will always preallocate memory to minimize reallocations - just use operator+= in that case. And if you do it manually, something like this or longer

a + " : " + c

Then it's creating temporaries - even if the compiler could eliminate some return value copies. That is because in a successively called operator+ it does not know whether the reference parameter references a named object or a temporary returned from a sub operator+ invocation. I would rather not worry about it before not having profiled first. But let's take an example for showing that. We first introduce parentheses to make the binding clear. I put the arguments directly after the function declaration that's used for clarity. Below that, i show what the resulting expression then is:

((a + " : ") + c) 
calls string operator+(string const&, char const*)(a, " : ")
  => (tmp1 + c)

Now, in that addition, tmp1 is what was returned by the first call to operator+ with the shown arguments. We assume the compiler is really clever and optimizes out the return value copy. So we end up with one new string that contains the concatenation of a and " : ". Now, this happens:

(tmp1 + c)
calls string operator+(string const&, string const&)(tmp1, c)
  => tmp2 == <end result>

Compare that to the following:

std::string f = "hello";
(f + c)
calls string operator+(string const&, string const&)(f, c)
  => tmp1 == <end result>

It's using the same function for a temporary and for a named string! So the compiler has to copy the argument into a new string and append to that and return it from the body of operator+. It cannot take the memory of a temporary and append to that. The bigger the expression is, the more copies of strings have to be done.

Next Visual Studio and GCC will support c++1x's move semantics (complementing copy semantics) and rvalue references as an experimental addition. That allows figuring out whether the parameter references a temporary or not. This will make such additions amazingly fast, as all the above will end up in one "add-pipeline" without copies.

If it turns out to be a bottleneck, you can still do

 std::string(a).append(" : ").append(c) ...

The append calls append the argument to *this and then return a reference to themselves. So no copying of temporaries is done there. Or alternatively, the operator+= can be used, but you would need ugly parentheses to fix precedence.

Solution 4 - C++

For most applications, it just won't matter. Just write your code, blissfully unaware of how exactly the + operator works, and only take matters into your own hands if it becomes an apparent bottleneck.

Solution 5 - C++

std::string operator+ allocates a new string and copies the two operand strings every time. repeat many times and it gets expensive, O(n).

std::string append and operator+= on the other hand, bump the capacity by 50% every time the string needs to grow. Which reduces the number of memory allocations and copy operations significantly, O(log n).

Solution 6 - C++

Unlike .NET System.Strings, C++'s std::strings are mutable, and therefore can be built through simple concatenation just as fast as through other methods.

Solution 7 - C++

perhaps std::stringstream instead?

But I agree with the sentiment that you should probably just keep it maintainable and understandable and then profile to see if you are really having problems.

Solution 8 - C++

In Imperfect C++, Matthew Wilson presents a dynamic string concatenator that pre-computes the length of the final string in order to have only one allocation before concatenating all parts. We can also implement a static concatenator by playing with expression templates.

That kind of idea have been implemented in STLport std::string implementation -- that does not conform to the standard because of this precise hack.

Solution 9 - C++

For small strings it doesn't matter. If you have big strings you'd better to store them as they are in vector or in some other collection as parts. And addapt your algorithm to work with such set of data instead of the one big string.

I prefer std::ostringstream for complex concatenation.

Solution 10 - C++

As with most things, it's easier not to do something than to do it.

If you want to output large strings to a GUI, it may be that whatever you're outputting to can handle the strings in pieces better than as a large string (for example, concatenating text in a text editor - usually they keep lines as separate structures).

If you want to output to a file, stream the data rather than creating a large string and outputting that.

I've never found a need to make concatenation faster necessary if I removed unnecessary concatenation from slow code.

Solution 11 - C++

Probably best performance if you pre-allocate (reserve) space in the resultant string.

template<typename... Args>
std::string concat(Args const&... args)
{
    size_t len = 0;
    for (auto s : {args...})  len += strlen(s);
 
    std::string result;
    result.reserve(len);    // <--- preallocate result
    for (auto s : {args...})  result += s;
    return result;
}

Usage:

std::string merged = concat("This ", "is ", "a ", "test!");

Solution 12 - C++

A simple array of characters, encapsulated in a class that keeps track of array size and number of allocated bytes is the fastest.

The trick is to do just one large allocation at start.

at

https://github.com/pedro-vicente/table-string

Benchmarks

For Visual Studio 2015, x86 debug build, substancial improvement over C++ std::string.

| API                   | Seconds           
| ----------------------|----| 
| SDS                   | 19 |  
| std::string           | 11 |  
| std::string (reserve) | 9  |  
| table_str_t           | 1  |  

Solution 13 - C++

You can try this one with memory reservations for each item:

namespace {
template<class C>
constexpr auto size(const C& c) -> decltype(c.size()) {
  return static_cast<std::size_t>(c.size());
}

constexpr std::size_t size(const char* string) {
  std::size_t size = 0;
  while (*(string + size) != '\0') {
    ++size;
  }
  return size;
}

template<class T, std::size_t N>
constexpr std::size_t size(const T (&)[N]) noexcept {
  return N;
}
}

template<typename... Args>
std::string concatStrings(Args&&... args) {
  auto s = (size(args) + ...);
  std::string result;
  result.reserve(s);
  return (result.append(std::forward<Args>(args)), ...);
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsnegView Question on Stackoverflow
Solution 1 - C++Brian R. BondyView Answer on Stackoverflow
Solution 2 - C++Carlos A. IbarraView Answer on Stackoverflow
Solution 3 - C++Johannes Schaub - litbView Answer on Stackoverflow
Solution 4 - C++PestoView Answer on Stackoverflow
Solution 5 - C++timmerovView Answer on Stackoverflow
Solution 6 - C++James CurranView Answer on Stackoverflow
Solution 7 - C++TimView Answer on Stackoverflow
Solution 8 - C++Luc HermitteView Answer on Stackoverflow
Solution 9 - C++Mykola GolubyevView Answer on Stackoverflow
Solution 10 - C++Pete KirkhamView Answer on Stackoverflow
Solution 11 - C++LanDenLabsView Answer on Stackoverflow
Solution 12 - C++Pedro VicenteView Answer on Stackoverflow
Solution 13 - C++voltentoView Answer on Stackoverflow