Is main() really start of a C++ program?

C++Standards ComplianceMainEntry Point

C++ Problem Overview


The section $3.6.1/1 from the C++ Standard reads,

> A program shall contain a global > function called main, which is the > designated start of the program.

Now consider this code,

int square(int i) { return i*i; }
int user_main()
{ 
    for ( int i = 0 ; i < 10 ; ++i )
           std::cout << square(i) << endl;
    return 0;
}
int main_ret= user_main();
int main() 
{
        return main_ret;
}

This sample code does what I intend it to do, i.e printing the square of integers from 0 to 9, before entering into the main() function which is supposed to be the "start" of the program.

I also compiled it with -pedantic option, GCC 4.5.0. It gives no error, not even warning!

So my question is,

Is this code really Standard conformant?

If it's standard conformant, then does it not invalidate what the Standard says? main() is not start of this program! user_main() executed before the main().

I understand that to initialize the global variable main_ret, the use_main() executes first but that is a different thing altogether; the point is that, it does invalidate the quoted statement $3.6.1/1 from the Standard, as main() is NOT the start of the program; it is in fact the end of this program!


EDIT:

How do you define the word 'start'?

It boils down to the definition of the phrase "start of the program". So how exactly do you define it?

C++ Solutions


Solution 1 - C++

You are reading the sentence incorrectly.

> A program shall contain a global function called main, which is the designated start of the program.

The standard is DEFINING the word "start" for the purposes of the remainder of the standard. It doesn't say that no code executes before main is called. It says that the start of the program is considered to be at the function main.

Your program is compliant. Your program hasn't "started" until main is started. The function is called before your program "starts" according to the definition of "start" in the standard, but that hardly matters. A LOT of code is executed before main is ever called in every program, not just this example.

For the purposes of discussion, your function is executed prior to the 'start' of the program, and that is fully compliant with the standard.

Solution 2 - C++

No, C++ does a lot of things to "set the environment" prior to the call of main; however, main is the official start of the "user specified" part of the C++ program.

Some of the environment setup is not controllable (like the initial code to set up std::cout; however, some of the environment is controllable like static global blocks (for initializing static global variables). Note that since you don't have full control prior to main, you don't have full control on the order in which the static blocks get initialized.

After main, your code is conceptually "fully in control" of the program, in the sense that you can both specify the instructions to be performed and the order in which to perform them. Multi-threading can rearrange code execution order; but, you're still in control with C++ because you specified to have sections of code execute (possibly) out-of-order.

Solution 3 - C++

Your program will not link and thus not run unless there is a main. However main() does not cause the start of the execution of the program because objects at file level have constructors that run beforehand and it would be possible to write an entire program that runs its lifetime before main() is reached and let main itself have an empty body.

In reality to enforce this you would have to have one object that is constructed prior to main and its constructor to invoke all the flow of the program.

Look at this:

class Foo
{
public:
   Foo();

 // other stuff
};

Foo foo;

int main()
{
}

The flow of your program would effectively stem from Foo::Foo()

Solution 4 - C++

You tagged the question as "C" too, then, speaking strictly about C, your initialization should fail as per section 6.7.8 "Initialization" of the ISO C99 standard.

The most relevant in this case seems to be constraint #4 which says:

> All the expressions in an initializer for an object that > has static storage duration shall be constant expressions or string literals.

So, the answer to your question is that the code is not compliant to the C standard.

You would probably want to remove the "C" tag if you were only interested to the C++ standard.

Solution 5 - C++

Section 3.6 as a whole is very clear about the interaction of main and dynamic initializations. The "designated start of the program" is not used anywhere else and is just descriptive of the general intent of main(). It doesn't make any sense to interpret that one phrase in a normative way that contradicts the more detailed and clear requirements in the Standard.

Solution 6 - C++

The compiler often has to add code before main() to be standard compliant. Because the standard specifies that initalization of globals/statics must be done before the program is executed. And as mentioned, the same goes for constructors of objects placed at file scope (globals).

Thus the original question is relevant to C as well, because in a C program you would still have the globals/static initialization to do before the program can be started.

The standards assume that these variables are initialized through "magic", because they don't say how they should be set before program initialization. I think they considered that as something outside the scope of a programming language standard.

Edit: See for example ISO 9899:1999 5.1.2: > All objects with static storage > duration shall be initialized (set to > their initial values) before program > startup. The manner and timing of such > initialization are otherwise > unspecified.

The theory behind how this "magic" was to be done goes way back to C's birth, when it was a programming language intended to be used only for the UNIX OS, on RAM-based computers. In theory, the program would be able to load all pre-initialized data from the executable file into RAM, at the same time as the program itself was uploaded to RAM.

Since then, computers and OS have evolved, and C is used in a far wider area than originally anticipated. A modern PC OS has virtual addresses etc, and all embedded systems execute code from ROM, not RAM. So there are many situations where the RAM can't be set "automagically".

Also, the standard is too abstract to know anything about stacks and process memory etc. These things must be done too, before the program is started.

Therefore, pretty much every C/C++ program has some init/"copy-down" code that is executed before main is called, in order to conform with the initialization rules of the standards.

As an example, embedded systems typically have an option called "non-ISO compliant startup" where the whole initialization phase is skipped for performance reasons, and then the code actually starts directly from main. But such systems don't conform to the standards, as you can't rely on the init values of global/static variables.

Solution 7 - C++

main() is a user function called by the C runtime library.

see also: https://stackoverflow.com/questions/3379190/avoiding-the-main-entry-point-in-a-c-program

Solution 8 - C++

Your "program" simply returns a value from a global variable. Everything else is initialization code. Thus, the standard holds - you just have a very trivial program and more complex initialization.

Solution 9 - C++

Seems like an English semantics quibble. The OP refers to his block of code first as "code" and later as the "program." The user writes the code, and then the compiler writes the program.

Solution 10 - C++

main is called after initializing all the global variables.

What the standard does not specify is the order of initialization of all the global variables of all the modules and statically linked libraries.

Solution 11 - C++

Ubuntu 20.04 glibc 2.31 RTFS + GDB

glibc does some setup before main so that some of its functionalities will work. Let's try to track down the source code for that.

hello.c

#include <stdio.h>

int main() {
    puts("hello");
    return 0;
}

Compile and debug:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out

Now in GDB:

b main
r
bt -past-main

gives:

#0  main () at hello.c:3
#1  0x00007ffff7dc60b3 in __libc_start_main (main=0x555555555149 <main()>, argc=1, argv=0x7fffffffbfb8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffbfa8) at ../csu/libc-start.c:308
#2  0x000055555555508e in _start ()

This already contains the line of the caller of main: https://github.com/cirosantilli/glibc/blob/glibc-2.31/csu/libc-start.c#L308.

The function has a billion ifdefs as can be expected from the level of legacy/generality of glibc, but some key parts which seem to take effect for us should simplify to:

# define LIBC_START_MAIN __libc_start_main

STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char **),
		 int argc, char **argv,
{

      /* Initialize some stuff. */

      result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
  exit (result);
}

Before __libc_start_main are are already at _start, which by adding gcc -Wl,--verbose we know is the entry point because the linker script contains:

ENTRY(_start)

and is therefore is the actual very first instruction executed after the dynamic loader finishes.

To confirm that in GDB, we an get rid of the dynamic loader by compiling with -static:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out

and then make GDB stop at the very first instruction executed with starti and print the first instructions:

starti
display/12i $pc

which gives:

=> 0x401c10 <_start>:   endbr64 
   0x401c14 <_start+4>: xor    %ebp,%ebp
   0x401c16 <_start+6>: mov    %rdx,%r9
   0x401c19 <_start+9>: pop    %rsi
   0x401c1a <_start+10>:        mov    %rsp,%rdx
   0x401c1d <_start+13>:        and    $0xfffffffffffffff0,%rsp
   0x401c21 <_start+17>:        push   %rax
   0x401c22 <_start+18>:        push   %rsp
   0x401c23 <_start+19>:        mov    $0x402dd0,%r8
   0x401c2a <_start+26>:        mov    $0x402d30,%rcx
   0x401c31 <_start+33>:        mov    $0x401d35,%rdi
   0x401c38 <_start+40>:        addr32 callq 0x4020d0 <__libc_start_main>

By grepping the source for _start and focusing on x86_64 hits we see that this seems to correspond to sysdeps/x86_64/start.S:58:


ENTRY (_start)
	/* Clearing frame pointer is insufficient, use CFI.  */
	cfi_undefined (rip)
	/* Clear the frame pointer.  The ABI suggests this be done, to mark
	   the outermost frame obviously.  */
	xorl %ebp, %ebp

	/* Extract the arguments as encoded on the stack and set up
	   the arguments for __libc_start_main (int (*main) (int, char **, char **),
		   int argc, char *argv,
		   void (*init) (void), void (*fini) (void),
		   void (*rtld_fini) (void), void *stack_end).
	   The arguments are passed via registers and on the stack:
	main:		%rdi
	argc:		%rsi
	argv:		%rdx
	init:		%rcx
	fini:		%r8
	rtld_fini:	%r9
	stack_end:	stack.	*/

	mov %RDX_LP, %R9_LP	/* Address of the shared library termination
				   function.  */
#ifdef __ILP32__
	mov (%rsp), %esi	/* Simulate popping 4-byte argument count.  */
	add $4, %esp
#else
	popq %rsi		/* Pop the argument count.  */
#endif
	/* argv starts just at the current stack top.  */
	mov %RSP_LP, %RDX_LP
	/* Align the stack to a 16 byte boundary to follow the ABI.  */
	and  $~15, %RSP_LP

	/* Push garbage because we push 8 more bytes.  */
	pushq %rax

	/* Provide the highest stack address to the user code (for stacks
	   which grow downwards).  */
	pushq %rsp

#ifdef PIC
	/* Pass address of our own entry points to .fini and .init.  */
	mov __libc_csu_fini@GOTPCREL(%rip), %R8_LP
	mov __libc_csu_init@GOTPCREL(%rip), %RCX_LP

	mov main@GOTPCREL(%rip), %RDI_LP
#else
	/* Pass address of our own entry points to .fini and .init.  */
	mov $__libc_csu_fini, %R8_LP
	mov $__libc_csu_init, %RCX_LP

	mov $main, %RDI_LP
#endif

	/* Call the user's main function, and exit with its value.
	   But let the libc call main.  Since __libc_start_main in
	   libc.so is called very early, lazy binding isn't relevant
	   here.  Use indirect branch via GOT to avoid extra branch
	   to PLT slot.  In case of static executable, ld in binutils
	   2.26 or above can convert indirect branch into direct
	   branch.  */
	call *__libc_start_main@GOTPCREL(%rip)

which ends up calling __libc_start_main as expected.

Unfortunately -static makes the bt from main not show as much info:

#0  main () at hello.c:3
#1  0x0000000000402560 in __libc_start_main ()
#2  0x0000000000401c3e in _start ()

If we remove -static and start from starti, we get instead:

=> 0x7ffff7fd0100 <_start>:     mov    %rsp,%rdi
   0x7ffff7fd0103 <_start+3>:   callq  0x7ffff7fd0df0 <_dl_start>
   0x7ffff7fd0108 <_dl_start_user>:     mov    %rax,%r12
   0x7ffff7fd010b <_dl_start_user+3>:   mov    0x2c4e7(%rip),%eax        # 0x7ffff7ffc5f8 <_dl_skip_args>
   0x7ffff7fd0111 <_dl_start_user+9>:   pop    %rdx

By grepping the source for _dl_start_user this seems to come from sysdeps/x86_64/dl-machine.h:L147

/* Initial entry point code for the dynamic linker.
   The C function `_dl_start' is the real entry point;
   its return value is the user program's entry point.  */
#define RTLD_START asm ("\n\
.text\n\
	.align 16\n\
.globl _start\n\
.globl _dl_start_user\n\
_start:\n\
	movq %rsp, %rdi\n\
	call _dl_start\n\
_dl_start_user:\n\
	# Save the user entry point address in %r12.\n\
	movq %rax, %r12\n\
	# See if we were run as a command with the executable file\n\
	# name as an extra leading argument.\n\
	movl _dl_skip_args(%rip), %eax\n\
	# Pop the original argument count.\n\
	popq %rdx\n\

and this is presumably the dynamic loader entry point.

If we break at _start and continue, this seems to end up in the same location as when we used -static, which then calls __libc_start_main.

When I try a C++ program instead:

hello.cpp

#include <iostream>

int main() {
    std::cout << "hello" << std::endl;
}

with:

g++ -ggdb3 -O0 -std=c++11 -Wall -Wextra -pedantic -o hello.out hello.cpp

the results are basically the same, e.g. the backtrace at main is the exact same.

I think the C++ compiler is just calling into hooks to achieve any C++ specific functionality, and things are pretty well factored across C/C++.

TODO:

Solution 12 - C++

Yes, main is the "entry point" of every C++ program, excepting implementation-specific extensions. Even so, some things happen before main, notably global initialization such as for main_ret.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNawazView Question on Stackoverflow
Solution 1 - C++Adam DavisView Answer on Stackoverflow
Solution 2 - C++Edwin BuckView Answer on Stackoverflow
Solution 3 - C++CashCowView Answer on Stackoverflow
Solution 4 - C++Remo.DView Answer on Stackoverflow
Solution 5 - C++ascheplerView Answer on Stackoverflow
Solution 6 - C++LundinView Answer on Stackoverflow
Solution 7 - C++sylvanaarView Answer on Stackoverflow
Solution 8 - C++Zac HowlandView Answer on Stackoverflow
Solution 9 - C++dSerkView Answer on Stackoverflow
Solution 10 - C++vz0View Answer on Stackoverflow
Solution 11 - C++Ciro Santilli Путлер Капут 六四事View Answer on Stackoverflow
Solution 12 - C++Fred NurkView Answer on Stackoverflow