In C++, am I paying for what I am not eating?
C++CC++ Problem Overview
Let's consider the following hello world examples in C and C++:
#include <stdio.h>
int main()
{
printf("Hello world\n");
return 0;
}
#include <iostream>
int main()
{
std::cout<<"Hello world"<<std::endl;
return 0;
}
When I compile them in godbolt to assembly, the size of the C code is only 9 lines (gcc -O3
):
.LC0:
.string "Hello world"
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
call puts
xor eax, eax
add rsp, 8
ret
But the size of the C++ code is 22 lines (g++ -O3
):
.LC0:
.string "Hello world"
main:
sub rsp, 8
mov edx, 11
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
xor eax, eax
add rsp, 8
ret
_GLOBAL__sub_I_main:
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
... which is much larger.
It is famous that in C++ you pay for what you eat. So, in this case, what am I paying for?
C++ Solutions
Solution 1 - C++
> So, in this case, what am I paying for?
std::cout
is more powerful and complicated than printf
. It supports things like locales, stateful formatting flags, and more.
If you don't need those, use std::printf
or std::puts
- they're available in <cstdio>
.
> It is famous that in C++ you pay for what you eat.
I also want to make it clear that C++ != The C++ Standard Library. The Standard Library is supposed to be general-purpose and "fast enough", but it will often be slower than a specialized implementation of what you need.
On the other hand, the C++ language strives to make it possible to write code without paying unnecessary extra hidden costs (e.g. opt-in virtual
, no garbage collection).
Solution 2 - C++
You are not comparing C and C++. You are comparing printf
and std::cout
, which are capable of different things (locales, stateful formatting, etc).
Try to use the following code for comparison. Godbolt generates the same assembly for both files (tested with gcc 8.2, -O3).
main.c:
#include <stdio.h>
int main()
{
int arr[6] = {1, 2, 3, 4, 5, 6};
for (int i = 0; i < 6; ++i)
{
printf("%d\n", arr[i]);
}
return 0;
}
main.cpp:
#include <array>
#include <cstdio>
int main()
{
std::array<int, 6> arr {1, 2, 3, 4, 5, 6};
for (auto x : arr)
{
std::printf("%d\n", x);
}
}
Solution 3 - C++
Your listings are indeed comparing apples and oranges, but not for the reason implied in most other answers.
Let’s check what your code actually does:
C:
- print a single string,
"Hello world\n"
C++:
- stream the string
"Hello world"
intostd::cout
- stream the
std::endl
manipulator intostd::cout
Apparently your C++ code is doing twice as much work. For a fair comparison we should combine this:
#include <iostream>
int main()
{
std::cout<<"Hello world\n";
return 0;
}
… and suddenly your assembly code for main
looks very similar to C’s:
main:
sub rsp, 8
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
xor eax, eax
add rsp, 8
ret
In fact, we can compare the C and C++ code line by line, and there are very few differences:
sub rsp, 8 sub rsp, 8
mov edi, OFFSET FLAT:.LC0 | mov esi, OFFSET FLAT:.LC0
> mov edi, OFFSET FLAT:_ZSt4cout
call puts | call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
xor eax, eax xor eax, eax
add rsp, 8 add rsp, 8
ret ret
The only real difference is that in C++ we call operator <<
with two arguments (std::cout
and the string). We could remove even that slight difference by using a closer C eqivalent: fprintf
, which also has a first argument specifying the stream.
This leaves the assembly code for _GLOBAL__sub_I_main
, which is generated for C++ but not C. This is the only true overhead that’s visible in this assembly listing (there’s more, invisible overhead for both languages, of course). This code performs a one-time setup of some C++ standard library functions at the start of the C++ program.
But, as explained in other answers, the relevant difference between these two programs won’t be found in the assembly output of the main
function since all the heavy lifting happens behind the scenes.
Solution 4 - C++
What you are paying for is to call a heavy library (not as heavy as printing into console). You initialize an ostream
object. There are some hidden storage. Then, you call std::endl
which is not a synonym for \n
. The iostream
library helps you adjusting many settings and putting the burden on the processor rather than the programmer. This is what you are paying for.
Let's review the code:
.LC0:
.string "Hello world"
main:
Initializing an ostream object + cout
sub rsp, 8
mov edx, 11
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
Calling cout
again to print a new line and flush
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
xor eax, eax
add rsp, 8
ret
Static storage initialization:
_GLOBAL__sub_I_main:
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
call std::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
Also, it is essential to distinguish between the language and the library.
BTW, this is just a part of the story. You do not know what is written in the functions you are calling.
Solution 5 - C++
> It is famous that in C++ you pay for what you eat. So, in this case, > what am I paying for?
That's simple. You pay for std::cout
. "You pay for only what you eat" doesn't mean "you always get best prices". Sure, printf
is cheaper. One can argue that std::cout
is safer and more versatile, thus its greater cost is justified (it costs more, but provides more value), but that misses the point. You don't use printf
, you use std::cout
, so you pay for using std::cout
. You don't pay for using printf
.
A good example is virtual functions. Virtual functions have some runtime cost and space requirements - but only if you actually use them. If you don't use virtual functions, you don't pay anything.
A few remarks
-
Even if C++ code evaluates to more assembly instructions, it's still a handful of instructions, and any performance overhead is still likely dwarfed by actual I/O operations.
-
Actually, sometimes it's even better than "in C++ you pay for what you eat". For example, compiler can deduce that virtual function call is not needed in some circumstances, and transform that into non-virtual call. That means you may get virtual functions for free. Isn't that great?
Solution 6 - C++
The "assembly listing for printf" is NOT for printf, but for puts (kind of compiler optimization?); printf is prety much more complex than puts... don't forget!
Solution 7 - C++
I see some valid answers here, but I'm going to get a little bit more into the detail.
Jump to the summary below for the answer to your main question if you don't want to go through this entire wall of text.
Abstraction
> So, in this case, what am I paying for?
You are paying for abstraction. Being able to write simpler and more human friendly code comes at a cost. In C++, which is an object-oriented language, almost everything is an object. When you use any object, three main things will always happen under the hood:
- Object creation, basically memory allocation for the object itself and its data.
- Object initialization (usually via some
init()
method). Usually memory allocation happens under the hood as the first thing in this step. - Object destruction (not always).
You don't see it in the code, but every single time you use an object all of the three above things need to happen somehow. If you were to do everything manually the code would obviously be way longer.
Now, abstraction can be made efficiently without adding overhead: method inlining and other techniques can be used by both compilers and programmers to remove overheads of abstraction, but this is not your case.
What's really happening in C++?
Here it is, broken down:
- The
std::ios_base
class is initialized, which is the base class for everything I/O related. - The
std::cout
object is initialized. - Your string is loaded and passed to
std::__ostream_insert
, which (as you already figured out by the name) is a method ofstd::cout
(basically the<<
operator) which adds a string to the stream. cout::endl
is also passed tostd::__ostream_insert
.__std_dso_handle
is passed to__cxa_atexit
, which is a global function that is responsible for "cleaning" before exiting the program.__std_dso_handle
itself is called by this function to deallocate and destroy remaining global objects.
So using C == not paying for anything?
In the C code, very few steps are happening:
- Your string is loaded and passed to
puts
via theedi
register. puts
gets called.
No objects anywhere, hence no need to initialize/destroy anything.
This however doesn't mean that you're not "paying" for anything in C. You are still paying for abstraction, and also initialization of the C standard library and dynamic resolution the printf
function (or, actually puts
, which is optimized by the compiler since you don't need any format string) still happen under the hood.
If you were to write this program in pure assembly it would look something like this:
jmp start
msg db "Hello world\n"
start:
mov rdi, 1
mov rsi, offset msg
mov rdx, 11
mov rax, 1 ; write
syscall
xor rdi, rdi
mov rax, 60 ; exit
syscall
Which basically only results in invoking the write
syscall followed by the exit
syscall. Now this would be the bare minimum to accomplish the same thing.
To summarize
C is way more bare-bone, and only does the bare minimum that is needed, leaving full control to the user, which is able to fully optimize and customize basically anything they want. You tell the processor to load a string in a register and then call a library function to use that string. C++ on the other hand is way more complex and abstract. This has enormous advantage when writing complicated code, and allows for easier to write and more human friendly code, but it obviously comes at a cost. There's always going to be a drawback in performance in C++ if compared to C in cases like this, since C++ offers more than what's needed to accomplish such basic tasks, and thus it adds more overhead.
Answering your main question:
> Am I paying for what I am not eating?
In this specific case, yes. You are not taking advantage of anything that C++ has to offer more than C, but that's just because there's nothing in that simple piece of code that C++ could help you with: it is so simple that you really do not need C++ at all.
Oh, and just one more thing!
The advantages of C++ may not look obvious at first glance, since you wrote a very simple and small program, but look at a little bit more complex example and see the difference (both programs do the exact same thing):
C:
#include <stdio.h>
#include <stdlib.h>
int cmp(const void *a, const void *b) {
return *(int*)a - *(int*)b;
}
int main(void) {
int i, n, *arr;
printf("How many integers do you want to input? ");
scanf("%d", &n);
arr = malloc(sizeof(int) * n);
for (i = 0; i < n; i++) {
printf("Index %d: ", i);
scanf("%d", &arr[i]);
}
qsort(arr, n, sizeof(int), cmp)
puts("Here are your numbers, ordered:");
for (i = 0; i < n; i++)
printf("%d\n", arr[i]);
free(arr);
return 0;
}
C++:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main(void) {
int n;
cout << "How many integers do you want to input? ";
cin >> n;
vector<int> vec(n);
for (int i = 0; i < vec.size(); i++) {
cout << "Index " << i << ": ";
cin >> vec[i];
}
sort(vec.begin(), vec.end());
cout << "Here are your numbers:" << endl;
for (int item : vec)
cout << item << endl;
return 0;
}
Hopefully you can clearly see what I mean here. Also notice how in C you have to manage memory at a lower level using malloc
and free
how you need to be more careful about indexing and sizes, and how you need to be very specific when taking input and printing.
Solution 8 - C++
There are a few misconceptions to start with. First, the C++ program does not result in 22 instructions, it's more like 22,000 of them (I pulled that number from my hat, but it's approximately in the ballpark). Also, the C code doesn't result in 9 instructions, either. Those are only the ones you see.
What the C code does is, after doing a lot of stuff that you don't see, it calls a function from the CRT (which is usually but not necessarily present as shared lib), then does not check for the return value or handle errors, and bails out. Depending on compiler and optimization settings it doesn't even really call printf
but puts
, or something even more primitive.
You could have written more or less the same program (except for some invisible init functions) in C++ as well, if only you called that same function the same way. Or, if you want to be super-correct, that same function prefixed with std::
.
The corresponding C++ code is in reality not at all the same thing. While the whole of <iostream>
it is well-known for being a fat ugly pig that adds an immense overhead for small programs (in a "real" program you don't really notice that much), a somewhat fairer interpretation is that it does an awful lot of stuff that you don't see and which just works. Including but not limited to magical formatting of pretty much any haphazard stuff, including different number formats and locales and whatnot, and buffering, and proper error-handling. Error handling? Well yes, guess what, outputting a string can actually fail, and unlike the C program, the C++ program would not ignore this silently. Considering what std::ostream
does under the hood, and without anyone getting aware of, it's actually pretty lightweight. Not like I'm using it because I hate the stream syntax with a passion. But still, it's pretty awesome if you consider what it does.
But sure, C++ overall is not as efficient as C can be. It cannot be as efficient since it is not the same thing and it isn't doing the same thing. If nothing else, C++ generates exceptions (and code to generate, handle, or fail on them) and it gives some guarantees that C doesn't give. So, sure, a C++ program kinda necessarily needs to be a little bit bigger. In the big picture, however, this does not matter in any way. On the contrary, for real programs, I've not rarely found C++ performing better because for one reason or another, it seems to lend for more favorable optimizations. Don't ask me why in particular, I wouldn't know.
If, instead of fire-and-forget-hope-for-the-best you care to write C code which is correct (i.e. you actually check for errors, and the program behaves correctly in presence of errors) then the difference is marginal, if existent.
Solution 9 - C++
You are paying for a mistake. In the 80s, when compilers aren't good enough to check format strings, operator overloading was seen as a good way to enforce some semblance of type safety during io. However, every one of its banner features are either implemented badly or conceptually bankrupt from the start:
##<iomanip>
The most repugnant part of the C++ stream io api is the existence of this formatting header library. Besides being stateful and ugly and error prone, it couples formatting to the stream.
Suppose you want to print out an line with 8 digit zero filled hex unsigned int followed by a space followed by a double with 3 decimal places. With <cstdio>
, you get to read a concise format string. With <ostream>
, you have to save the old state, set alignment to right, set fill character, set fill width, set base to hex, output the integer, restore saved state (otherwise your integer formatting will pollute your float formatting), output the space, set notation to fixed, set precision, output the double and the newline, then restore the old formatting.
// <cstdio>
std::printf( "%08x %.3lf\n", ival, fval );
// <ostream> & <iomanip>
std::ios old_fmt {nullptr};
old_fmt.copyfmt (std::cout);
std::cout << std::right << std::setfill('0') << std::setw(8) << std::hex << ival;
std::cout.copyfmt (old_fmt);
std::cout << " " << std::fixed << std::setprecision(3) << fval << "\n";
std::cout.copyfmt (old_fmt);
##Operator Overloading
<iostream>
is the poster child of how not to use operator overloading:
std::cout << 2 << 3 && 0 << 5;
##Performance
std::cout
is several times slower printf()
. The rampant featuritis and virtual dispatch does take its toll.
##Thread Safety
Both <cstdio>
and <iostream>
are thread safe in that every function call is atomic. But, printf()
gets a lot more done per call. If you run the following program with the <cstdio>
option, you will see only a row of f
. If you use <iostream>
on a multicore machine, you will likely see something else.
// g++ -Wall -Wextra -Wpedantic -pthread -std=c++17 cout.test.cpp
#define USE_STREAM 1
#define REPS 50
#define THREADS 10
#include <thread>
#include <vector>
#if USE_STREAM
#include <iostream>
#else
#include <cstdio>
#endif
void task()
{
for ( int i = 0; i < REPS; ++i )
#if USE_STREAM
std::cout << std::hex << 15 << std::dec;
#else
std::printf ( "%x", 15);
#endif
}
int main()
{
auto threads = std::vector<std::thread> {};
for ( int i = 0; i < THREADS; ++i )
threads.emplace_back(task);
for ( auto & t : threads )
t.join();
#if USE_STREAM
std::cout << "\n<iostream>\n";
#else
std::printf ( "\n<cstdio>\n" );
#endif
}
The retort to this example is that most people exercise discipline to never write to a single file descriptor from multiple threads anyway. Well, in that case, you'll have to observe that <iostream>
will helpfully grab a lock on every <<
and every >>
. Whereas in <cstdio>
, you won't be locking as often, and you even have the option of not locking.
<iostream>
expends more locks to achieve a less consistent result.
Solution 10 - C++
In addition to what all the other answers have said,
there's also the fact that std::endl
is not the same as '\n'
.
This is an unfortunately common misconception. std::endl
does not mean "new line",
it means "print new line and then flush the stream".
Flushing is not cheap!
Completely ignoring the differences between printf
and std::cout
for a moment, to be functionally eqvuialent to your C example, your C++ example ought to look like this:
#include <iostream>
int main()
{
std::cout << "Hello world\n";
return 0;
}
And here's an example of what your examples should be like if you include flushing.
C
#include <stdio.h>
int main()
{
printf("Hello world\n");
fflush(stdout);
return 0;
}
C++
#include <iostream>
int main()
{
std::cout << "Hello world\n";
std::cout << std::flush;
return 0;
}
When comparing code, you should always be careful that you're comparing like for like and that you understand the implications of what your code is doing. Sometimes even the simplest examples are more complicated than some people realise.
Solution 11 - C++
While the existing technical answers are correct, I think that the question ultimately stems from this misconception:
> It is famous that in C++ you pay for what you eat.
This is just marketing talk from the C++ community. (To be fair, there's marketing talk in every language community.) It doesn't mean anything concrete that you can seriously depend on.
"You pay for what you use" is supposed to mean that a C++ feature only has overhead if you're using that feature. But the definition of "a feature" is not infinitely granular. Often you will end up activating features that have multiple aspects, and even though you only need a subset of those aspects, it's often not practical or possible for the implementation to bring the feature in partially.
In general, many (though arguably not all) languages strive to be efficient, with varying degrees of success. C++ is somewhere on the scale, but there is nothing special or magical about its design that would allow it to be perfectly successful in this goal.
Solution 12 - C++
The Input / Output functions in C++ are elegantly written and are designed so they are simple to use. In many respects they are a showcase for the object-orientated features in C++.
But you do indeed give up a bit of performance in return, but that's negligible compared to the time taken by your operating system to handle the functions at a lower level.
You can always fall back to the C style functions as they are part of the C++ standard, or perhaps give up portability altogether and use direct calls to your operating system.
Solution 13 - C++
As you have seen in other answers, you pay when you link in general libraries and call complex constructors. There is no particular question here, more a gripe. I'll point out some real-world aspects:
-
Barne had a core design principle to never let efficiency be a reason for staying in C rather than C++. That said, one needs to be careful to get these efficiencies, and there are occasional efficiencies that always worked but were not 'technically' within the C spec. For example, the layout of bit fields was not really specified.
-
Try looking through ostream. Oh my god its bloated! I wouldn't be surprised to find a flight simulator in there. Even stdlib's printf() usally runs about 50K. These aren't lazy programmers: half of the printf size was to do with indirect precision arguments that most people never use. Almost every really constrained processor's library creates its own output code instead of printf.
-
The increase in size is usually providing a more contained and flexible experience. As an analogy, a vending machine will sell a cup of coffee-like-substance for a few coins and the whole transaction takes under a minute. Dropping into a good restaurant involves a table setting, being seated, ordering, waiting, getting a nice cup, getting a bill, paying in your choice of forms, adding a tip, and being wished a good day on your way out. Its a different experience, and more convenient if you are dropping in with friends for a complex meal.
-
People still write ANSI C, though rarely K&R C. My experience is we always compile it with a C++ compiler using a few configuration tweaks to limit what is dragged in. There are good arguments for other languages: Go removes the polymorphic overhead and crazy preprocessor; there have been some good arguments for smarter field packing and memory layout. IMHO I think any language design should start with a listing of goals, much like the Zen of Python.
It's been a fun discussion. You ask why can't you have magically small, simple, elegant, complete, and flexible libraries?
There is no answer. There will not be an answer. That is the answer.