Concatenating strings in C, which method is more efficient?

CStringPerformanceConcatenation

C Problem Overview


I came across these two methods to concatenate strings:

Common part:

char* first= "First";
char* second = "Second";
char* both = malloc(strlen(first) + strlen(second) + 2);

Method 1:

strcpy(both, first);
strcat(both, " ");       // or space could have been part of one of the strings
strcat(both, second);

Method 2:

sprintf(both, "%s %s", first, second);

In both cases the content of both would be "First Second".

I would like to know which one is more efficient (I have to perform several concatenation operations), or if you know a better way to do it.

C Solutions


Solution 1 - C

For readability, I'd go with

char * s = malloc(snprintf(NULL, 0, "%s %s", first, second) + 1);
sprintf(s, "%s %s", first, second);

If your platform supports GNU extensions, you could also use asprintf():

char * s = NULL;
asprintf(&s, "%s %s", first, second);

If you're stuck with the MS C Runtime, you have to use _scprintf() to determine the length of the resulting string:

char * s = malloc(_scprintf("%s %s", first, second) + 1);
sprintf(s, "%s %s", first, second);

The following will most likely be the fastest solution:

size_t len1 = strlen(first);
size_t len2 = strlen(second);

char * s = malloc(len1 + len2 + 2);
memcpy(s, first, len1);
s[len1] = ' ';
memcpy(s + len1 + 1, second, len2 + 1); // includes terminating null

Solution 2 - C

Don't worry about efficiency: make your code readable and maintainable. I doubt the difference between these methods is going to matter in your program.

Solution 3 - C

Here's some madness for you, I actually went and measured it. Bloody hell, imagine that. I think I got some meaningful results.

I used a dual core P4, running Windows, using mingw gcc 4.4, building with "gcc foo.c -o foo.exe -std=c99 -Wall -O2".

I tested method 1 and method 2 from the original post. Initially kept the malloc outside the benchmark loop. Method 1 was 48 times faster than method 2. Bizarrely, removing -O2 from the build command made the resulting exe 30% faster (haven't investigated why yet).

Then I added a malloc and free inside the loop. That slowed down method 1 by a factor of 4.4. Method 2 slowed down by a factor of 1.1.

So, malloc + strlen + free DO NOT dominate the profile enough to make avoiding sprintf worth while.

Here's the code I used (apart from the loops were implemented with < instead of != but that broke the HTML rendering of this post):

void a(char *first, char *second, char *both)
{
    for (int i = 0; i != 1000000 * 48; i++)
    {
        strcpy(both, first);
        strcat(both, " ");
        strcat(both, second);
    }
}

void b(char *first, char *second, char *both)
{
    for (int i = 0; i != 1000000 * 1; i++)
        sprintf(both, "%s %s", first, second);
}

int main(void)
{
    char* first= "First";
    char* second = "Second";
    char* both = (char*) malloc((strlen(first) + strlen(second) + 2) * sizeof(char));

    // Takes 3.7 sec with optimisations, 2.7 sec WITHOUT optimisations!
    a(first, second, both);

    // Takes 3.7 sec with or without optimisations
    //b(first, second, both);

    return 0;
}

Solution 4 - C

size_t lf = strlen(first);
size_t ls = strlen(second);

char *both = (char*) malloc((lf + ls + 2) * sizeof(char));

strcpy(both, first);

both[lf] = ' ';
strcpy(&both[lf+1], second);

Solution 5 - C

They should be pretty much the same. The difference isn't going to matter. I would go with sprintf since it requires less code.

Solution 6 - C

The difference is unlikely to matter:

  • If your strings are small, the malloc will drown out the string concatenations.
  • If your strings are large, the time spent copying the data will drown out the differences between strcat / sprintf.

As other posters have mentioned, this is a premature optimization. Concentrate on algorithm design, and only come back to this if profiling shows it to be a performance problem.

That said... I suspect method 1 will be faster. There is some---admittedly small---overhead to parse the sprintf format-string. And strcat is more likely "inline-able".

Solution 7 - C

sprintf() is designed to handle far more than just strings, strcat() is specialist. But I suspect that you are sweating the small stuff. C strings are fundamentally inefficient in ways that make the differences between these two proposed methods insignificant. Read ["Back to Basics"][1] by Joel Spolsky for the gory details.

This is an instance where C++ generally performs better than C. For heavy weight string handling using std::string is likely to be more efficient and certainly safer.

[1]: http://www.joelonsoftware.com/articles/fog0000000319.html "Back to Basics"

[edit]

[2nd edit]Corrected code (too many iterations in C string implementation), timings, and conclusion change accordingly

I was surprised at Andrew Bainbridge's comment that std::string was slower, but he did not post complete code for this test case. I modified his (automating the timing) and added a std::string test. The test was on VC++ 2008 (native code) with default "Release" options (i.e. optimised), Athlon dual core, 2.6GHz. Results:

C string handling = 0.023000 seconds
sprintf           = 0.313000 seconds
std::string       = 0.500000 seconds

So here strcat() is faster by far (your milage may vary depending on compiler and options), despite the inherent inefficiency of the C string convention, and supports my original suggestion that sprintf() carries a lot of baggage not required for this purpose. It remains by far the least readable and safe however, so when performance is not critical, has little merit IMO.

I also tested a std::stringstream implementation, which was far slower again, but for complex string formatting still has merit.

Corrected code follows:

#include <ctime>
#include <cstdio>
#include <cstring>
#include <string>

void a(char *first, char *second, char *both)
{
    for (int i = 0; i != 1000000; i++)
    {
        strcpy(both, first);
        strcat(both, " ");
        strcat(both, second);
    }
}

void b(char *first, char *second, char *both)
{
    for (int i = 0; i != 1000000; i++)
        sprintf(both, "%s %s", first, second);
}

void c(char *first, char *second, char *both)
{
	std::string first_s(first) ;
	std::string second_s(second) ;
	std::string both_s(second) ;

    for (int i = 0; i != 1000000; i++)
        both_s = first_s + " " + second_s ;
}

int main(void)
{
    char* first= "First";
    char* second = "Second";
    char* both = (char*) malloc((strlen(first) + strlen(second) + 2) * sizeof(char));
	clock_t start ;

	start = clock() ;
    a(first, second, both);
	printf( "C string handling = %f seconds\n", (float)(clock() - start)/CLOCKS_PER_SEC) ;

	start = clock() ;
    b(first, second, both);
	printf( "sprintf           = %f seconds\n", (float)(clock() - start)/CLOCKS_PER_SEC) ;

	start = clock() ;
	c(first, second, both);
	printf( "std::string       = %f seconds\n", (float)(clock() - start)/CLOCKS_PER_SEC) ;

    return 0;
}

Solution 8 - C

I don't know that in case two there's any real concatenation done. Printing them back to back doesn't constitute concatenation.

Tell me though, which would be faster:

  1. a) copy string A to new buffer b) copy string B to buffer c) copy buffer to output buffer

or

1)copy string A to output buffer b) copy string b to output buffer

Solution 9 - C

  • strcpy and strcat are much simpler oprations compared to sprintf, which needs to parse the format string
  • strcpy and strcat are small so they will generally be inlined by the compilers, saving even one more extra function call overhead. For example, in llvm strcat will be inlined using a strlen to find copy starting position, followed by a simple store instruction

Solution 10 - C

Neither is terribly efficient since both methods have to calculate the string length or scan it each time. Instead, since you calculate the strlen()s of the individual strings anyway, put them in variables and then just strncpy() twice.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionXandyView Question on Stackoverflow
Solution 1 - CChristophView Answer on Stackoverflow
Solution 2 - CNed BatchelderView Answer on Stackoverflow
Solution 3 - CAndrew BainbridgeView Answer on Stackoverflow
Solution 4 - CMichalis GiannakidisView Answer on Stackoverflow
Solution 5 - CJay ConrodView Answer on Stackoverflow
Solution 6 - CijprestView Answer on Stackoverflow
Solution 7 - CCliffordView Answer on Stackoverflow
Solution 8 - CSan JacintoView Answer on Stackoverflow
Solution 9 - CTong ZhouView Answer on Stackoverflow
Solution 10 - CdajobeView Answer on Stackoverflow