This obfuscated C code claims to run without a main(), but what does it really do?

CC PreprocessorObfuscation

C Problem Overview


#include <stdio.h>
#define decode(s,t,u,m,p,e,d) m##s##u##t
#define begin decode(a,n,i,m,a,t,e)

int begin()
{
	printf("Ha HA see how it is?? ");
}

Does this indirectly call main? how?

C Solutions


Solution 1 - C

C language define execution environment in two categories: freestanding and hosted. In both execution environment a function is called by the environment for program startup.
In a freestanding environment program startup function can be implementation defined while in hosted environment it should be main. No program in C can run without program startup function on the defined environments.

In your case, main is hidden by the preprocessor definitions. begin() will expand to decode(a,n,i,m,a,t,e) which further will be expanded to main.

int begin() -> int decode(a,n,i,m,a,t,e)() -> int m##a##i##n() -> int main() 

decode(s,t,u,m,p,e,d) is a parameterized macro with 7 parameters. Replacement list for this macro is m##s##u##t. m, s, u and t are 4th, 1st, 3rd and 2nd parameter used in the replacement list.

s, t, u, m, p, e, d
1  2  3  4  5  6  7

Rest are of no use (just to obfuscate). Argument passed to decode is "a,n,i,m,a,t,e" so, the identifiers m, s, u and t are replaced with arguments m, a, i and n, respectively.

 m --> m  
 s --> a 
 u --> i 
 t --> n

Solution 2 - C

Try using gcc -E source.c, output ends with:

int main()
{
    printf("Ha HA see how it is?? ");
}

So a main() function is actually generated by preprocessor.

Solution 3 - C

The program in question does call main() due to macro expansion, but your assumption is flawed - it doesn't have to call main() at all!

Strictly speaking, you can have a C program and be able to compile it without having a main symbol. main is something that the c library expects to jump in to, after it has finished its own initialization. Usually you jump into main from the libc symbol known as _start. It is always possible to have a very valid program, that simply executes assembly, without having a main. Take a look at this:

/* This must be compiled with the flag -nostdlib because otherwise the
 * linker will complain about multiple definitions of the symbol _start
 * (one here and one in glibc) and a missing reference to symbol main
 * (that the libc expects to be linked against).
 */

void
_start ()
{
    /* calling the write system call, with the arguments in this order:
     * 1. the stdout file descriptor
     * 2. the buffer we want to print (Here it's just a string literal).
     * 3. the amount of bytes we want to write.
     */
    asm ("int $0x80"::"a"(4), "b"(1), "c"("Hello world!\n"), "d"(13));
    asm ("int $0x80"::"a"(1), "b"(0)); /* calling exit syscall, with the argument to be 0 */
}

Compile the above with gcc -nostdlib without_main.c, and see it printing Hello World! on the screen just by issuing system calls (interrupts) in inline assembly.

For more information about this particular issue, check out the ksplice blog

Another interesting issue, is that you can also have a program that compiles without having the main symbol correspond to a C function. For instance you can have the following as a very valid C program, that only makes the compiler whine when you up the Warnings level.

/* These values are extracted from the decimal representation of the instructions
 * of a hello world program written in asm, that gdb provides.
 */
const int main[] = {
    -443987883, 440, 113408, -1922629632,
    4149, 899584, 84869120, 15544,
    266023168, 1818576901, 1461743468, 1684828783,
    -1017312735
};

The values in the array are bytes that correspond to the instructions needed to print Hello World on the screen. For a more detailed account of how this specific program works, take a look at this blog post, which is where I also read it first.

I want to make one final notice about these programs. I do not know if they register as valid C programs according to the C language specification, but compiling these and running them is certainly very possible, even if they violate the specification itself.

Solution 4 - C

Someone is trying to act like Magician. He thinks he can trick us. But we all know, c program execution begins with main().

The int begin() will be replaced with decode(a,n,i,m,a,t,e) by one pass of preprocessor stage. Then again, decode(a,n,i,m,a,t,e) will be replaced with m##a##i##n. As by positional association of macro call, s will has a value of character a. Likewise, u will be replaced by 'i' and t will be replaced by 'n'. And, that's how, m##s##u##t will become main

Regarding, ## symbol in macro expansion, it is the preprocessing operator and it performs token pasting. When a macro is expanded, the two tokens on either side of each ‘##’ operator are combined into a single token, which then replaces the ‘##’ and the two original tokens in the macro expansion.

If you don't believe me, you can compile your code with -E flag. It will stop compilation process after preprocessing and you can see the result of token pasting.

gcc -E FILENAME.c

Solution 5 - C

decode(a,b,c,d,[...]) shuffles the first four arguments and joins them to get a new identifier, in the order dacb. (The remaining three arguments are ignored.) For instance, decode(a,n,i,m,[...]) gives the identifier main. Note that this is what the begin macro is defined as.

Therefore, the begin macro is simply defined as main.

Solution 6 - C

In your example, main() function is actually present, because begin is a macro which the compiler replaces with decode macro which in turn replaced by the expression m##s##u##t. Using macro expansion ##, you will reach the word main from decode. This is a trace:

begin --> decode(a,n,i,m,a,t,e) --> m##parameter1##parameter3##parameter2 ---> main

It's just a trick to have main(), but using the name main() for the program's entry function is not necessary in C programming language. It depends on your operating systems and the linker as one of its tools.

In Windows, you don't always use main(), but rather WinMain or wWinMain, although you can use main(), even with Microsoft's toolchain. In Linux, one can use _start.

It's up to the linker as an operating system tool to set the entry point, and not the language itself. You can even set our own entry point, and you can make a library that is also executable!

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRajeev SinghView Question on Stackoverflow
Solution 1 - ChaccksView Answer on Stackoverflow
Solution 2 - CjdarthenayView Answer on Stackoverflow
Solution 3 - CNlightNFotisView Answer on Stackoverflow
Solution 4 - CabhiaroraView Answer on Stackoverflow
Solution 5 - CFrxstremView Answer on Stackoverflow
Solution 6 - CHo1View Answer on Stackoverflow