Understanding the simplest LLVM IR
SyntaxLlvmSyntax Problem Overview
I transform the simplest C code
#include <stdio.h>
int main()
{
return 0;
}
to its LLVM IR, using
clang -emit-llvm -S hello.c
The generated IR is:
define i32 @main() #0 {
%1 = alloca i32, align 4
store i32 0, i32* %1
ret i32 0
}
However, I do not understand this IR. (LLVM doc helps but not that much for beginners)
- Why do we have
%1 = alloca i32, align 4
? What does it correspond to in the original code? - Same question for
store i32 0, i32* %1
- Does alloca mean allocation on the stack (instead of the dynamic allocation)?
- What do 'align 4' mean?
Syntax Solutions
Solution 1 - Syntax
define i32 @main() #0
This defines a function called main
that returns a 32 bit integer. The #0
means to use the attributes named #0
for the function. For example, there may be something like attributes #0 = { alwaysinline alignstack=4 }
in the IR, and these attributes will be applied to main
.
%1 = alloca i32, align 4
This allocates a 32 bit integer on the stack. %1
is the name of a pointer to this location on the stack. The align 4
ensures that the address will be a multiple of 4
store i32 0, i32* %1
This sets the 32 bit integer pointed to by %1
to the 32 bit value 0. It's like saying *x = 1
in C++
ret i32 0
This returns from the function with a 32 bit return value of 0
The assignment is odd, considering that you don't have a local variable in main
. LLVM uses BasicBlock
to represent groups of instructions, and a basic block has an exit point and a list of instructions. My guess would be that the compiler has decided to use the return
as the exit from the basic block and has opted to put in at least one instruction into the block. The assignment is basically a no-op.
Solution 2 - Syntax
The %n
are virtual registers that will be resolved to actual registers when generating code for the target machine.
The i32
is there for type information. In the original code it was an int
which your compiler took to be 32-bit integer.
alloca
is for allocating space on the stack. In this example it is i32
(32-bit integer) so you can load in the 0 for the return value. align 4
gives this allocation 4 byte alignment i.e. the stack pointer will be on a 4 byte aligned address.
It isn't the most efficient representation but that is not the aim if IR. IR should be portable to different architectures. It is then down to the backend to produce efficient machine code.
LLVM Language Reference Manual
Why alloca
and store
are there is to do with this being the main
function. If you had called this function something else, the IR would just contain ret
as you expected. From examining the assembly produced for main it appears to be related to the stack base pointer
but I don't fully understand why it is there. Time to pull out the C standard I think.
Update: I couln't find anything in the C standard but it seems clang does this for every main function. I don't know the clang code base well enough to track it down though.
Update: See comments with Bill Lynch below. These instuctions are there:
> for the possible implicit return 0
that main functions have
Solution 3 - Syntax
Variables are usually put on the stack in unoptimized builds for debugging reasons. In optimized builds that use real registers the value might disappear before the function exits.
The comment about portability isn't precisely correct, if this IR was passed through 'opt' it would eliminate the stack store.