The Birth and Death of a Running Program

I've been on a quest over the last year or so to understand fully how a program ends up going from your brain into code, from code into an executable and from an executable into an executing program on your processor. I like the point I've got to in this pursuit, so I'm going to brain dump here :)

Prerequisite Knowledge: Some knowledge of assembler will help. Some knowledge of processors will also help. I wouldn't call either of these necessary, though, I'll try my best to explain what needs explaining. What you will need, though, is a toolchain. If you're on Ubuntu, hopefully this article will help. If you're on another system, Google for "[your os] build essentials", e.g. "arch linux build essentials".

# The Birth of a Program

You have an idea for a program. It's the best program idea you've ever had so you quickly prototype something in C:

#include <stdio.h>

int main(int argc, char* argv[]) {
    printf("Hello, world!\n");
    return 0;
}

A work of genius. You quickly compile and run it to make sure all is good:

$ gcc hello.c -o hello
$ ./hello
Hello, world!

Boom!

But wait... What has happened? How has it gone from being quite an understandable high level program into being something that your processor can understand and run. Let's go through what's happening step by step.

GCC is doing a tonne of things behind the scenes in the gcc hello.c -o hello command. It is compiling your C code into assembly, optimising lots in the process, then it is creating "object files" out of your assembly (usually in a format called ELF on Linux platforms), then it is linking those object files together into an executable file (again, executable ELF format). At this point we have the hello executable and it is in a well-known format with lots of cross-machine considerations baked in.

After we run the executable, the "loader" comes into play. The loader figures out where in memory to put your code, it figures out whether it needs to mess about with any of the pointers in the file, it figures out of the file needs any dynamic libraries linked to it at runtime and all sorts of mental shit like that. Don't worry if none of this makes sense, we're going to go into it in good time.

# Compiling from C to assembly

This is a difficult bit of the process and it's why compilers used to cost you an arm and a leg before Stallman came along with the Gnu Compiler Collection (GCC). Commercial compilers do still exist but the free world has standardised on GCC or LLVM, it seems. I won't go into a discussion as to which is better because I honestly don't know enough to comment :)

If you want to see the assembly output of the hello.c program, you can run the following command:

$ gcc -S hello.c

This command will create a file called hello.s, which contains assembly code. If you've never worked with assembly code before, this step is going to be a bit of an eye opener. The file generated will be long, difficult to read and probably different to mine depending on your platform.

Now is not the time or place to teach assembly. If you want to learn, this book is a brilliant place to start. I will, however, point out a little bit of weirdness in the file. Do you see stuff like this?

EH_frame0:
Lsection_eh_frame:
Leh_frame_common:
Lset0 = Leh_frame_common_end-Leh_frame_common_begin
    .long    Lset0
Leh_frame_common_begin:
    .long    0
    .byte    1
    .asciz     "zR"
    .byte    1
    .byte    120
    .byte    16
    .byte    1
    .byte    16
    .byte    12
    .byte    7
    .byte    8
    .byte    144
    .byte    1
    .align    3

I was initially curious as to what this was as well, so I checked out stack overflow and came across a really great explanation of what this bit means, which you can read here.

Also, notice the following:

callq    _puts

The assembly program is calling puts instead of printf. This is an example of the kind of optimisation GCC will do for you, even on the default level of "no optimisation" (-O0 flag on the command line). printf is a really heavy function, due to having to deal with a large range of format codes. puts is far less heavy. I could only find the NetBSD version of it. puts itself is very small and it delegates to __sfvwrite, the code of which is here. If you want more information on how GCC will optimise printf, this is a great article.

Also, if assembler is a bit new to you, a few things to note is that this post is using GAS (Gnu Assembler) syntax. There are different assemblers out there, a lot of people like the Netwide Assembler (NASM) which has a more human friendly syntax.

GAS suffixes its commands with a letter that describes what "word size" we're dealing with. Above, you'll see we used callq. The q stands for "quad", which is a 64bit value. Here are other suffixes you may run in to:

b = byte (8 bit)
s = short (16 bit integer) or single (32-bit floating point)
w = word (16 bit)
l = long (32 bit integer or 64-bit floating point)
q = quad (64 bit)
t = ten bytes (80-bit floating point)

# Assembling into machine code

By comparison, turning assembly instructions into machine code is pretty simple. Compiling is a much more difficult step than assembling is. Assembly instructions are often a 1 to 1 mapping into machine code.

At the end of the assembling stage, you would expect to have a file that just contained binary instructions right? Sadly that's not quite the case. The processor needs to know a lot more about your code than just the instructions. To facilitate passing this required meta-information there are a variety of binary file formats. A very common one in *nix systems is ELF: executable linkable format.

Your program will be broken up into lots of sections. For example, a section called .text contains your program code. A section called .bss stores statically initialised variables (globals, essentially), that are not given a starting value, thus get zeroed. A section called .strtab contains a list of all of the strings you plan on using in your program. If you statically initialise a string anywhere, it'll go into the .strtab section. In our hello.c example, the string "Hello, world!\n" will go into the .strtab.

This article, from issue 13 of Linux Journal in 1995, gives a really good overview of the ELF format from one of the people who created it. It's quite in depth and I didn't understand everything he said (still not sure on relocations), but it's very interesting to see the motivations behind the format.

# Linking into an executable

Coming back from the previous tangent, let's think about linking. When you compile multiple files, the .c files get compiled into .o files. When I first started doing C code, one thing that continuously baffled me was how a .c file referenced a function in another .c file. You only reference .h files in a .c file, so how did it know what code to run?

The way it works is by creating a symbol table. There are a multitude of types of symbols in an executable file, but the general gist is that a symbol is a named reference to something. The nm utility allows you to inspect an executable file's symbol table. Here's some example output:

$ nm hello
0000000100001048 B _NXArgc
0000000100001050 B _NXArgv
0000000100001060 B ___progname
0000000100000000 A __mh_execute_header
0000000100001058 B _environ
                 U _exit
0000000100000ef0 T _main
                 U _puts
0000000100001000 d _pvars
                 U dyld_stub_binder
0000000100000eb0 T start

Look at the symbols labelled with the letter U. We have _exit, _puts and dyld_stub_binder. The _exit symbol is operating system specific and will be the routine that knows how to return control back to the OS once your program has finished, the _puts symbol is very important for our program and exists in whatever libc we have, and dyld_stub_binder is an entry point for resolving dynamic loads. All of these symbols are "unresolved", which means if you try and run the program and no suitable match is found for them, your program will fail.

So when you create an object file, the reason you include the header is because everything in that header file will become an unresolved symbol. The process of linking multiple object files together will do the job of finding the appropriate function that matches your symbol and link them together for the final executable created.

To demonstrate this, consider the following C file:

#include <stdio.h>

extern void test(void);

int main(int argc, char* argv[]) {
    printf("Hello, world!\n");
    return 0;
}

Compiling this file into an object file and then inspecting the contents will show you the following:

$ gcc -c hello.c
$ nm hello.o
0000000000000050 r EH_frame0
000000000000003b r L_.str
0000000000000000 T _main
0000000000000068 R _main.eh
                 U _puts
                 U _test

We now have an unresolved symbol called _test! The linker will expect to find that somewhere else and, if it does not, will throw a bit of a hissy fit. Trying to link this file on its own complains about 2 unresolved symbols, _test and _puts. Linking it against libc complains about one unresolved symbol, _test.

Unfortunately, because we don't actually have a definition for test() we can't use it. This may sound confusing, seeing as we defer the linking of puts() until runtime. Why can't we just do the same with test()? Build an executable file and let the loader/linker try and figure it out at runtime?

In the linking process you need to specify where the linker will be able to find things on the target system. Let's step through the original hello.c example, doing each of the compilation steps ourself:

$ gcc -c hello.c

This creates hello.o with an unresolved _puts symbol.

$ ld hello.o

This craps out. We need to give it more information. At this point I'm going to mention that I'm on a Mac system and am about to reference libraries that have different names on a Linux system. As a general rule here, you can replace the .dylib extension with .so:

$ ld hello.o /usr/lib/libc.dylib

This still craps out. Check out this error message:

ld: entry point (start) undefined.  Usually in crt1.o for inferred
architecture x86_64

What the hell? This is a really good error to come across and learn about, though. It leads us nicely into the next section.

# Running the program

Wait, didn't we finish the last section with an object file that wouldn't link for some arcane reason? Yes, we did. But getting to a point where we can successfully link it requires us to know a little bit more about how our program starts running when it's loaded into memory.

Before every program starts, the operating system needs to set things up for it. Things such as a stack, a heap, a set of page tables for accessing virtual memory and so on. We need to "bootstrap" our process and set up a good environment for it to run in. This setup is usually done in a file called crt0.o.

When you started learning programming and you used a language that got compiled, one of the first things you learned was that your program's entry point is main() right? The true story is that your program doesn't start in main, it starts in start. This detail is abstracted away from you by the OS and the toolchain, though, in the form of the crt0.o file.

The osdev wiki shows a great example of a simple crt0.o file that I'll copy here:

.section .text

.global _start
_start:
    # Set up end of the stack frame linked list.
    movq $0, %rbp
    pushq %rbp
    movq %rsp, %rbp

    # We need those in a moment when we call main.
    pushq %rsi
    pushq %rdi

    # Prepare signals, memory allocation, stdio and such.
    call initialize_standard_library

    # Run the global constructors.
    call _init

    # Restore argc and argv.
    popq %rdi
    popq %rsi

    # Run main
    call main

    # Terminate the process with the exit code.
    movl %eax, %edi
    call exit

07/08/2013 UPDATE: In a previous version of this post I got this bit totally wrong, confusing the 32bit x86 calling convention with the x86-64 calling convention. Thanks to Craig in the comments for pointing it out :) The below should now be correct.

The line that's probably most interesting there is where main is called. This is the entry point into your code. Before it happens, there is a lot of setup. Also notice that argc and argv handling is done in this file, but it assumes that the loader has pushed the values into registers beforehand.

Why, you might ask, do argc and argv live in %rsi and %rdi before being passed to your main function? Why are those registers so special?

The reason is something called a "calling convention". This convention details how arguments should be passed to a function call before it happens. The calling convention in x86-64 C is a little bit tricky but the explanation (taken from here) is as follows:

Once arguments are classified, the registers get assigned (in left-to-right order) for passing as follows:

If the class is MEMORY, pass the argument on the stack.

If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used

For example, take this C code:

void add(int a, int b) {
    return a + b;
}

int main(int argc, char* argv[]) {
    add(1, 12);

    return 0;
}

The assembler that would call that function goes something like this:

movq $1,  %rdi
movq $12, %rsi
call add

The $12 and $1 there are the literal, decimal values being passed to the function. Easy peasy :) The convention isn't something that needs to be followed in your own assembly code. You're free to put arguments wherever you want, but if you want to interact with existing library functions then you need to do as the Romans do.

With all of this said and done, how do we correctly link and run our hello.o file? Like so:

$ ld hello.o /usr/lib/libc.dylib /usr/lib/crt1.o -o hello
$ ./hello
Hello, world!

Hey! I thought you said it was crt0.o? It can be... crt1.o is a file with exactly the same purpose but it has more in it. crt0.o didn't exist on my system, only crt1.o did. I guess it's an OS decision. Here's a short mailing list post that talks about it.

Interestingly, inspecting the symbol table of the executable we just linked together shows this:

$ nm hello
0000000000002058 B _NXArgc
0000000000002060 B _NXArgv
                 U ___keymgr_dwarf2_register_sections
0000000000002070 B ___progname
                 U __cthread_init_routine
0000000000001eb0 T __dyld_func_lookup
0000000000001000 A __mh_execute_header
0000000000001d9a T __start
                 U _atexit
0000000000002068 B _environ
                 U _errno
                 U _exit
                 U _mach_init_routine
0000000000001d40 T _main
                 U _puts
                 U dyld_stub_binder
0000000000001e9c T dyld_stub_binding_helper
0000000000001d78 T start

The reason is that .dylib and .so files (they have the same job, but on Mac they have the .dylib extension and probably a different internal format) are dynamic or "shared" libraries. They will tell the linker that they are to be linked dynamically, at runtime, rather than statically, at compile time. The crt*.o files are normal objects, and link statically which is why the start symbol has an address in the above symbol table.

# The Death of a Running Program

You return a number from main() and then your program is done, right? Not quite. There is still a lot of work to be done. For starters, your exit code needs to be propagated up to any parent processes that may be anticipating your death. The exit code tells them something about how your program finished. Exactly what it tells them is entirely up to you, but the standard is that 0 means everything was okay, anything non-zero (up to a max of 255) signifies that an error occurred.

There is also a lot of OS cleanup that happens when your program dies. Things like tidying up file descriptors and deallocating any heap memory you may have forgotten to free() before you returned. You should totally get into the habit of cleaning up yourself, though!

# Wrapping up

So that's about the extent of my knowledge on how your code gets turned into a running program. I know I missed some bits out, oversimplified some things and I was probably wrong in places. If you can correct me on any point, or have anything illuminating about how non-x86 or non-ELF systems do the above tasks, I would love to have a discussion about it in the comments :)