Dude, where's my main?

So I was writing a debugger. You know the sort of thing: breakpoints, stepping, checking the value of variables. It was going wonderfully until I tried to debug my debugger with my debugger. main was just... gone. It would run, but trying to set a breakpoint on it crashed the program. It took me weeks to figure out why. This is that story.

Introduction

The program we're going to use to demonstrate the debugger is a classic C "Hello, world!":

// programs/hello.c
#include <stdio.h>

int main(void) {
  printf("Hello, world!\n");
  return 0;
}

Let's compile this program with debug information and do a few basic operations in the debugger. It's written in Rust, so we invoke it by using Rust's build tool, cargo:

$ gcc -g programs/hello.c -o programs/hello
$ cargo run -- programs/hello
> symbol main
0x401c60 main
> register rip
rip 0x401b30
> break main
> cont
> register rip
rip 0x401c60
> cont
Hello, world!

Our "Hello, world!" program is initially in a stopped state at some address that isn't main, so we set a breakpoint on main, continue, and then verify that we are at the first instruction of main. Then we continue the program and we see it print out "Hello, world!" as expected. Neat!

This got me thinking: maybe I could debug my debugger in my debugger. That would be awesome.

$ cargo run -- target/debug/rust-debugger
> symbol main
0xd2f90 main
> register rip
rip 0x7fb95aa0c100
> break main
libc err: Input/output error

Oh. What happened?

Understanding the problem

The error message we get comes from a piece of code in my debugger that takes any error code we get from a libc function and runs it through strerror. So which libc function did it come from?

ptrace

The only libc function that gets called when setting a breakpoint is the ptrace function. ptrace is a Linux-specific API that debuggers use to control the processes they're debugging. With it you can read and write memory, read and write registers, advance by 1 instruction, and loads more.

When you set a breakpoint in the debugger, the debugger writes 0xCC to the address we want to pause execution at. 0xCC is an x86 instruction that triggers a software interrupt specific to debuggers. It stops the debuggee and returns control to the debugger.

The error string "Input/output error" corresponds to the error code EIO. Checking the man page for ptrace and looking for EIO tells us ptrace will return EIO when:

Request is invalid, or an attempt was made to read from or write to an invalid area in the tracer's or the tracee's memory, or there was a word-alignment violation, or an invalid signal was specified during a restart request.

We can rule out the request being invalid, as it worked fine in our other program. Same for word-alignment violation. We're not doing anything funky with signals, so that leaves us with a read or write to an invalid area in the debuggee's memory.

Let's look again at the output we got in our debugger:

> symbol main
0xd2f90 main
> register rip
rip 0x7fb95aa0c100

There's a huge difference between the address of the main symbol and the value of the instruction pointer at the start of the program's execution. The difference was much smaller when debugging our "Hello, world!" program, so where does the address for main come from?

Debug info

There are a couple of ways to get the address of a function in a compiled program. The method I went with is to use the debugging info compilers provide you with when you compile with the -g flag.

This debug info is in a format called DWARF. DWARF is a big and complex beast, and with good reason! It has to describe where functions have been inlined, which file each function was defined in, where functions start and begin, where to find every local variable in every function, and a whole lot more.

We're only interested in where functions begin. DWARF debug info is in a tree-like structure, much like code is, and each type of node is identified with a "tag." The tag that represents a function definitoin is DW_TAG_subprogram:

$ readelf --debug-dump programs/hello
[...]
 <1><2a>: Abbrev Number: 2 (DW_TAG_subprogram)
    <2b>   DW_AT_low_pc      : 0x401c60
    <33>   DW_AT_high_pc     : 0x11
    <39>   DW_AT_name        : main
[...]

The value of DW_AT_low_pc looks familiar. It's the address we found main at when we ran this program in our debugger, and that's because this is exactly where we get it from. Let's check for main in our debugger:

$ readelf --debug-dump target/debug/rust-debugger
[...]
 <2><804db>: Abbrev Number: 16 (DW_TAG_subprogram)
    <804dc>   DW_AT_low_pc      : 0xd2f90
    <804e4>   DW_AT_high_pc     : 0x2ef
    <804ee>   DW_AT_name        : main
[...]

That address matches up as well, except this time it's wrong. What's different about rust-debugger for this to not be where main ends up?

I was stumped by this for a good while before noticing a difference in the output of readelf -h:

$ readelf -h programs/hello
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00
  Class:               ELF64
  Data:                2's complement, little endian
  Version:             1 (current)
  OS/ABI:              UNIX - GNU
  ABI Version:         0
  Type:                EXEC (Executable file)
  Machine:             Advanced Micro Devices X86-64
  Version:             0x1
  Entry point address: 0x401b30
$ readelf -h target/debug/rust-debugger
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:               ELF64
  Data:                2's complement, little endian
  Version:             1 (current)
  OS/ABI:              UNIX - System V
  ABI Version:         0
  Type:                DYN (Shared object file)
  Machine:             Advanced Micro Devices X86-64
  Version:             0x1
  Entry point address: 0x93160

What caught my attention was the Type field. In programs/hello it's a plain old executable, but in target/debug/rust-debugger it's a shared object. Aren't shared objects the type for libraries? Why would an executable be reporting itself as a shared object?

It turns out that this is how "position-independent executables" look. They're executables that have been compiled in such a way that they can live in any part of the address space. This is achieved by making sure there are no absolute memory references. Instead, all memory references are done by making them relative to the current instruction pointer.

This triggered an old memory: is this somehow related to "address space layout randomisation" (ASLR)? I've done some reverse engineering challenges in my time, and know that one of the ways operating systems try to protect programs from attackers is to load them in to a random part of the address space. How could I confirm this?

$ cargo run -- target/debug/rust-debugger
> register rip
rip 0x7f11d4997100
$ cargo run -- target/debug/rust-debugger
> register rip
rip 0x7f810cd16100
$ cargo run -- target/debug/rust-debugger
> register rip
rip 0x7fc08a2b8100

Well, that's good enough for me. It looks like it's only random in a portion of the address space. Still, it changes each time. Can we disable this?

It turns out we can! Linux has the concept of "personalities" for processes. From what I can see in the man page this seems to be some sort of compatibility layer for running programs meant for non-Linux systems. Fortuantely for us, it also lets us disable ASLR. Children inherit their parent's personality, so if we do the equivalent of the following C snippet in our Rust code:

#include <sys/personality.h>

int main() {
  personality(ADDR_NO_RANDOMIZE);
}

Our child should be loaded at a non-random address. Let's try it!

$ cargo run -- target/debug/rust-debugger
> register rip
rip 0x7ffff7fd3100
$ cargo run -- target/debug/rust-debugger
> register rip
rip 0x7ffff7fd3100
$ cargo run -- target/debug/rust-debugger
> register rip
rip 0x7ffff7fd3100

The randomization is gone, but our process is still loaded at an adress that doesn't line up with the addresses in the binary. This isn't the answer to our problem.

Finding the solution

My search eventually led me to this patch to gdb from 2009. It's a patch introducing support for position-independent executables. My thinking was that gdb works on these binaries, so there must be something it does to correct the symbol addresses.

Combing through the patch, I found a function called svr4_relocate_main_executable. It has a huge comment above it that explains some mathematics you can do when the entrypoint of the running process is not the same as the entrypoint given in the binary on disk. Promising.

However, the comment also explains that this mathematics doesn't work if your binary has an "interp section."

$ readelf -S target/debug/rust-debugger | grep interp
  [ 1] .interp           PROGBITS         00000000000002e0  000002e0

Rats. I spent some time trying to get it work despite this, but I got nowhere. I would either get the same error as before, or no error but the breakpoint would never get hit. I assume I was modifying a valid but arbitary part of memory.

A little more searching later and I stumbled on this StackOverflow answer. At the very end of it the user mentions how GDB knows where the binary has been loaded:

This begs the question: how does GDB discover where ld-linux itself has been loaded. The kernel tells it via the auxiliary vector, AT_BASE entry.

Auxillary vector

I'd never heard the term "auxillary vector" before but at this point I was happy to follow any lead.

When the operating system loads a program from disk in to memory and prepares to execute it, it passes in some key pieces of information. Most people are familiar with the arguments passed in from the command line (argv) and environment variables (envp). These values are found on the stack, just above where the stack begins when execution is passed to main.

Just above argv and envp, though, is our newly discovered auxv. We can use the environment variable LD_SHOW_AUXV=1 to print the contents of the auxillary vector for any program:

$ LD_SHOW_AUXV=1 ls
AT_BASE:              0x7f0790d8f000
AT_FLAGS:             0x0
AT_ENTRY:             0x5641572c6260
AT_PLATFORM:          x86_64
[...]
Cargo.lock  programs/  target/
Cargo.toml  src/

I've trimmed the output for brevity. There's a lot of info in there. The things that are most interesting to us are the AT_BASE and AT_ENTRY values. Let's fire up our debugger with this environment variable and see what it says:

$ LD_SHOW_AUXV=1 cargo run -- target/debug/rust-debugger
AT_BASE:              0x7f80cd45f000
AT_ENTRY:             0x55ba24f62160
> cont
AT_BASE:              0x7ffff7fd1000
AT_ENTRY:             0x5555555d9160

The first auxv values are those of the debugger process, and we have to continue the debuggee process in order to see its auxillary vector because it starts in a paused state. Armed with this, and the knowledge it won't move around because we've disabled ASLR, we can try setting some breakpoints:

> register rip
rip 0x7ffff7fd3100
> break 5555555d9160
> cont
> register rip
rip 0x5555555d9160

I won't lie, I lost my shit when the breakpoint worked. We're not there yet, though. Are we actually at main? How would we know?

A function of the debugger I haven't mentioned until now is being able to disassemble things. If you run disas with no arguments, it disassembles from the current instruction pointer:

> disas
0x5555555d9160 f30f1efa       endbr64
0x5555555d9164 31ed           xor ebp,ebp
0x5555555d9166 4989d1         mov r9,rdx
0x5555555d9169 5e             pop rsi
0x5555555d916a 4889e2         mov rdx,rsp
0x5555555d916d 4883e4f0       and rsp,0FFFFFFFFFFFFFFF0h
[...]

We can use the objdump command to see if this matches up with our main function:

$ objdump -d target/debug/rust-debugger | less
00000000000b0a60 <main>:
   b0a60:       48 83 ec 18             sub    $0x18,%rsp
   b0a64:       8a 05 14 e5 48 00       mov    0x48e514(%rip),%al
   b0a6a:       48 63 cf                movslq %edi,%rcx
   b0a6d:       48 8d 3d ac f8 ff ff    lea    -0x754(%rip),%rdi
   b0a74:       48 89 74 24 10          mov    %rsi,0x10(%rsp)
   b0a79:       48 89 ce                mov    %rcx,%rsi
   b0a7c:       48 8b 54 24 10          mov    0x10(%rsp),%rdx
   b0a81:       88 44 24 0f             mov    %al,0xf(%rsp)
   b0a85:       e8 d6 6d ff ff          callq  a7860 <_ZN3std2rt10lang_start17h004f6ce3435c8ab1E>
   b0a8a:       48 83 c4 18             add    $0x18,%rsp
   b0a8e:       c3                      retq

What?! They don't match. How can this be? I was so sure.

It took me a few minutes to calm down from the shock before I remembered that programs don't actually start at main. There's a little bit of code that runs before main called _start:

$ objdump -d target/debug/rust-debugger | less
0000000000085160 <_start>:
   85160:       f3 0f 1e fa             endbr64
   85164:       31 ed                   xor    %ebp,%ebp
   85166:       49 89 d1                mov    %rdx,%r9
   85169:       5e                      pop    %rsi
   8516a:       48 89 e2                mov    %rsp,%rdx
   8516d:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
[...]

It matches! Yes!

Because we know that the whole executable gets shifted by the same amount, because instruction pointer relative addresses wouldn't work otherwise, we can do a little bit of mathematics here to figure out what the address of main should be:

0x5555555d9160 - 0x85160 + 0xb0a60 = 0x555555604a60

Let's try it:

> break 555555604a60
> cont
> register rip
rip 0x555555604a60
> disas
0x555555604a60 4883ec18       sub rsp,18h
0x555555604a64 8a0514e54800   mov al,[rel 555555A92F7Eh]
0x555555604a6a 4863cf         movsxd rcx,edi
0x555555604a6d 488d3dacf8ffff lea rdi,[rel 555555604320h]
0x555555604a74 4889742410     mov [rsp+10h],rsi
0x555555604a79 4889ce         mov rsi,rcx
0x555555604a7c 488b542410     mov rdx,[rsp+10h]
0x555555604a81 8844240f       mov [rsp+0Fh],al
0x555555604a85 e8d66dffff     call 00005555555FB860h
0x555555604a8a 4883c418       add rsp,18h
0x555555604a8e c3             ret

We did it! \o/

Wrapping up

I learned a lot on this journey. I learned that Linux processes have "personalities" that modify their runtime environment. I learned that executables can be position-independent just like libraries, and this is part of how address space layout randomisation works.

Last and certainly not least, I learned that Linux tells a program things about its runtime environment through a thing called an "auxillary vector," which lives on the stack just above the environment variables, and that's how gdb knows where to find main in position-independent executables.

Now I know all of this, I can go ahead and fix my breakpoint code so that it works with position-independent executables.

Appendix

The code for the debugger is available on GitHub, and if you're interested in the exact state things were in when I wrote this article you can check it out like so:

$ git clone https://github.com/samwho/rust-debugger
$ cd rust-debugger
$ git checkout 00bcdd4772496913d1417feb7f8886d40aed46d5

The commits I made that fixed our debugger for position independent executables are here and here.