Poking Around Inside Ruby
I've been wanting to start exploring the MRI Ruby source code for quite some time now, but didn't quite know where to start. I wanted to devise a way of figuring out what parts of the Ruby source code were important, so if I were to run a "Hello, world!" program I could see exactly what the Ruby VM had to do to execute that.
Prerequisite Knowledge: It would help if you've used GDB before, but it should be self explanatory if you haven't. If you've never used Ruby this post might give you tips on how to explore language code bases, but other than that it'll probably be uninteresting. It'll also help you if you have a vague idea of what a "symbol table" is in a binary file.
This is going to be quite meandering. I'm basically just going to be picking interesting things in the differences between certain running programs and looking at the C source to figure out what's happening. If that sounds like your cup of tea, let's do it! :)
What I was looking to find out was the path of execution Ruby followed when
interpreting and running a simple puts "Hello, world!"
. I wanted a list of C
functions, in the order they were called, from start to finish.
# ltrace, dtrace, strace
The suite of *trace
tools didn't quite fit my purposes. ltrace
only traces
calls into dynamic libraries (through some really clever symbol stubbing, might
look more into how that works and write a post on it at some point). dtrace
probably would be able to do what I want but I didn't have the patience to learn
it inside out (plus I wanted to check the difference between Ruby 1.9 and Ruby
2.0, which I wouldn't be able to do without modifying the 1.9 source code).
strace
only traces system calls and, while that's useful, I'm specifically
interested in the C calls from inside the Ruby code base.
# gdb!
After a fair bit of Googling, I came across a
blog post
that explained how to do exactly what I wanted using a really simple gdb
script. This is an example of what such a script would look like:
break some_function
command
silent
backtrace 1
continue
end
break some_other_function
command
silent
backtrace 1
continue
end
run
The idea here is to set breakpoints on functions that you're interested in and print out the latest item in the backtrace whenever the breakpoint is hit, then just continue. Easy peasy.
The slight complication is that I'm interested in all of the C functions I can
get my hands on. We can check how many functions Ruby has by using the nm
tool
under Linux:
$ nm `which ruby`
nm: /home/sam/.rbenv/shims/ruby: File format not recognized
Ahh yes. Balls. I'm using rbenv
to manage multiple Ruby installs on one
machine, so the ruby
in my path isn't actually a compiled executable file,
it's a bash script that sets up some environment variables and then calls some
other executable to find which version of Ruby it needs to hit.
Fortunately, the rbenv
developers are good people and offer an rbenv which
command that finds the real executable for us:
$ nm `rbenv which ruby`
Lots and lots of symbols...
Let's apply some filtering. Each line that nm
outputs has the following
format:
<memory address> <symbol type> <symbol name>
We're only interested in functions, which according to the nm
man pages is a
symbol type of "t". So let's try and get a better count:
$ nm `rbenv which ruby` | grep " t " | wc -l
4546
That's using Ruby 2.0.0-p247. There's quite a large difference between 2.x and 1.9.x, check this out:
$ rbenv local 1.9.3-p448
$ nm `rbenv which ruby` | grep " t " | wc -l
3825
Not sure what conclusions we can draw from that, but it's interesting nonetheless.
# Auto generating the gdb script
At this point I'd hope nobody would have any intention of hand-crafting
thousands of lines of gdb
script, so let's hack together something that
generates it for us:
ruby_binary = ARGV.shift
`nm #{ruby_binary} | grep " t " | cut -d " " -f 3`.each_line do |symbol|
puts <<-EOF
break #{symbol.chomp}
commands
silent
backtrace 1
continue
end
EOF
end
Who needs error checking? This beauty will output the necessary script block for each and every function inside the Ruby executable.
# Gluing it all together
First, we have to generate our gdb script. We'll do that separately:
$ ruby gdbscript.rb `rbenv which ruby` > script.gdb
Next, we need to invoke gdb with our script and the necessary Ruby program. I
did it as a one-liner and wrapped it around the time
program to see how long
it would take. It ended up looking like this:
$ time gdb --batch -x script.gdb --args `rbenv which ruby` -e "puts 'hello'"
Now, I had no idea how long this would take. I certainly didn't think it would take this long for 1.9.3:
real 339m14.492s
user 196m39.789s
sys 147m3.939s
Unfortunately, I didn't make a note of how long 2.0 took (I fell asleep). It took all night and a lot of the next day, though. Easily over twice as long as 1.9.3, which completely dashes all hopes of this being a remotely useful way of figuring out what Ruby is doing under the hood. However, we've collected some potentially fun data, so let's soldier on.
# Analysing the output
I ran this 4 times. 2 different Ruby programs over both of the Ruby versions,
2.0 and 1.9.3. The two Ruby programs were the simple puts 'hello'
and, for
comparison purposes, exit 42
. I saved all of the output to files, check out
how big they are (keep in mind this is one function call per line):
$ wc -l *
697049 1.9.3_exit42.out
696583 1.9.3_hello.out
1991463 2.0.0_exit42.out
1990666 2.0.0_hello.out
Firstly, it's a lot of lines regardless. Secondly, look at the difference between Ruby 1.9.3 and Ruby 2.0.0! There's almost 3 times more lines for Ruby 2.0. You generally find that Ruby 2.0 starts up a lot faster than Ruby 1.9, though... I wonder what's going on.
# So what does the output look like?
Here's some example output:
$ head -n 20 rtrace-1.9.3-exit42.out
#0 0x00007ffff7bc5990 in frame_dummy () from /usr/lib/libpthread.so.0
#0 0x00007ffff7bc5910 in register_tm_clones () from /usr/lib/libpthread.so.0
#0 0x0000000000414d20 in frame_dummy ()
#0 0x0000000000414cc0 in register_tm_clones ()
#0 main (argc=3, argv=0x7fffffffe678) at main.c:31
#0 fill_standard_fds () at ruby.c:1842
#0 0x0000000000589100 in fstat ()
#0 0x0000000000589100 in fstat ()
#0 0x0000000000589100 in fstat ()
#0 get_stack (addr=0x7fffffffe548, size=0x7fffffffe550) at thread_pthread.c:487
#0 get_stack (addr=0x7fffffffe448, size=0x7fffffffe450) at thread_pthread.c:487
#0 Init_BareVM () at vm.c:2224
#0 vm_init2 (vm=0x846f20) at vm.c:1670
#0 rb_objspace_alloc () at gc.c:432
#0 Init_native_thread () at thread_pthread.c:436
#0 native_thread_init (th=0x847520) at thread_pthread.c:450
#0 native_cond_initialize (cond=0x8475b0, flags=1) at thread_pthread.c:242
#0 ruby_thread_set_native (th=0x847520) at thread_pthread.c:428
#0 native_mutex_initialize (lock=0x83e6c0 <signal_thread_list_lock>) at thread_pthread.c:219
#0 mutex_debug (msg=0x5b33f4 "init", lock=0x83e6c0 <signal_thread_list_lock>) at thread_pthread.c:178
The first few bits have memory addresses rather than symbol names and I'm not 100% sure why that is. You don't see it in the rest of the file, though, so I assume it's some voodoo that's done during the bootstrapping of the process. Explanations welcome!
# Working with the data
Let's do something obvious to begin with: which C functions get called the most:
$ cat rtrace-1.9.3-exit42.out | grep "#0 " | cut -d " " -f 3 | sort | uniq -c | sort -nr | head
43031 parser_nextc
22583 us_ascii_mbc_enc_len
22529 parser_tokadd
21272 vm_malloc_fixup
21197 vm_malloc_prepare
20378 insn_len
19841 vm_xmalloc
16910 rb_enc_dummy_p
15822 parser_tokadd_mbchar
14751 compile_data_alloc
The file name should be self explanatory. I'm grepping for lines beginning with "#0 " because those are the backtrace lines (there are a few lines in there that aren't backtrace lines, when threads die and whatnot). Then I'm grabbing the 3rd space separated field, sorting it, counting it, sorting it again and grabbing the first 10.
No real surprises... Parsing/lexing/memory/encoding stuff. Let's compare it against Ruby 2.0:
$ cat rtrace-2.0.0-exit42.out | grep "#0 " | cut -d " " -f 3 | sort | uniq -c | sort -nr | head
122836 parser_nextc
72578 markable_object_p
67072 GET_THREAD
64400 parser_tokadd
62980 mbc_enc_len
61260 gc_mark_ptr
56964 gc_mark
52222 insn_len
43518 parser_tokadd_mbchar
37379 compile_data_alloc
Mooooostly the same... Some differences, though. I compiled both versions of Ruby specially to include maximum debugging information, which is why we're seeing MACRO-style calls in here too. Let's dig around some of the more interesting looking differences.
# Spelunking in a C code base
If you've not really rummaged around inside a C code base before, it seems really, really scary. There are tools that can help, though!
I'm a vim user, and no self-respecting vim user that does any C code leaves the
warm confines of their bed without having "cscope" installed. I use
this plugin, which makes some really
nice mappings into useful cscope commands. For example, if I want to find the
definition of a symbol (which is what I do most often when exploring a new code
base), I can type <leader>fg
and it will open a split window with potential
definitions of that symbol.
It does other really cool stuff, too, like finding functions that call a
specific function, or finding files that #include
the current file and so on.
Typing :help cscope
will tell you way more than I can.
If you're not using vim I can't really help you a whole lot, sorry. The Emacs wiki has a page on cscope and emacs which may help. There's also this sublime text plugin that seems to do the trick.
# To GET_THREAD or not to GET_THREAD
Interestingly, GET_THREAD
is never called in the 1.9 examples. It did get
changed a fair bit between 1.9 and 2.0, though, look at the 1.9 version:
#define GET_THREAD() ruby_current_thread
2.0 version:
static inline rb_thread_t *
GET_THREAD(void)
{
rb_thread_t *th = ruby_current_thread;
#if OPT_CALL_CFUNC_WITHOUT_FRAME
if (UNLIKELY(th->passed_ci != 0)) {
void vm_call_cfunc_push_frame(rb_thread_t *th);
vm_call_cfunc_push_frame(th);
}
#endif
return th;
}
Without the OPT_CALL_CFUNC_WITHOUT_FRAME
symbol defined, because the Ruby 2.0
version gets inlined, the two version of this do exactly the same thing.
From what I can tell from the patch notes, this is part of an optimisation that is disabled by default that delays the creation of a "frame" (stack frame? not sure) for C function calls from Ruby. I tried to dig into it but my unfamiliarity with the Ruby code base is holding me back a bit.
I did check to make sure I had macro information compiled into my 1.9 version
and it is there, there are other macros listed in my outputs, so this doesn't
seem like me fucking up... Bit weird that 2.0 makes such heavy use of
GET_THREAD
and 1.9 doesn't. Anyone got any ideas? :)
# UNLIKELY?
In the Ruby 2.0 GET_THREAD
snippet from above there's an intriguing looking
macro called UNLIKELY
. There's a corresponding LIKELY
macro as well, and
they expand to the following in both 1.9 and 2.0:
#define LIKELY(x) (__builtin_expect((x), 1))
#define UNLIKELY(x) (__builtin_expect((x), 0))
__builtin_expect
is a GCC specific thing. The documentation says the
following:
You may use
__builtin_expect
to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.
So, branch prediction information. I tried to write a small, demonstrative example of when using this builtin made a difference to the generated assembly code but every time the output was exactly the same regardless of whether I used likely or unlikely. According to this post you have to write something suitably non-trivial in order to make sure the compiler doesn't just optimise your code without the need for the prediction data.
However, even in the blog post's example code, compiling with and without the
use of __builtin_expect
only changes 1 line of the generated assembly.
Without __builtin_expect
:
cmpl $0, %eax
With __builtin_expect
movslq %eax, %rax
cmpq $0, %rax
The only thing that change is doing is converting between a 32bit signed integer
to a 64bit signed integer (because __builtin_expect
returns a long). movslq
is a command that moves a value between a 32bit register and a 64bit register,
doing a sign extension if it
needs to.
The above difference only appears with -O0
specified, any other optimisation
level just returns assembler that is exactly the same. I used the following
commands to generate the assembler I compared against:
$ gcc -O3 -S -DEXPECT_RESULT=1 builtin_expect_test.c -o expect.s
$ gcc -O3 -S -DDONT_EXPECT builtin_expect_test.c -o dont_expect.s
$ diff expect.s dont_expect.s
No difference. It didn't even matter when changing what the expected result was
supposed to be. Which sort of questions some of guy's conclusions, but maybe
there are differences between the versions of GCC we were using or perhaps I did
something wrong? Either way, I can't seem to find an example that clearly shows
what __builtin_expect
is doing with the information you're giving it.
# GC in 2.0 but not in 1.9
Waaay up the article, when we took a look at the 10 most called functions for each ruby version, you may notice that there's nothing said about garbage collection for 1.9, but the 2.0 list has 3 GC related functions in it! Why is this?
According to a slide deck by Joe Damato, the Ruby garbage collector runs when Ruby has run out of space on its own heap to allocate objects. Seeing as we're not really doing any object allocations (unless using literals counts (it probably does)), my guess is that loading the standard library is where the problem lies.
Has the standard library significantly increased in size? I'm using David
Wheeler's sloccount
tool to find out. Results:
1.9:
$ sloccount lib/
ruby: 80731 (100.00%)
2.0:
$ sloccount lib/
ruby: 104673 (100.00%)
That's a fairly large difference, maybe it would account for the extra GC calls? It's really hard to know for sure. I'm not sure if it would account for so many more lines generated for the Ruby 2.0 runs over the 1.9 runs...
To be completely fair for 1.9, it does do GC stuff, it's just not in the top 10. If we look at the top 20 functions called for 1.9:
$ cat rtrace-1.9.3-exit42.out | grep "#0 " | cut -d " " -f 3 | sort | uniq -c | sort -nr | head -n 20
43031 parser_nextc
22583 us_ascii_mbc_enc_len
22529 parser_tokadd
21272 vm_malloc_fixup
21197 vm_malloc_prepare
20378 insn_len
19841 vm_xmalloc
16910 rb_enc_dummy_p
15822 parser_tokadd_mbchar
14751 compile_data_alloc
13382 gc_mark
13009 enc_set_index
11493 verify_list
9784 mbc_enc_len
9473 search_nonascii
8563 ADD_ELEM
8133 check_int
7597 coderange_scan
7592 ruby_sip_hash24
7525 yylex
We can see gc_mark
in there, but not a whole lot else. Without digging into
the inner workings of Ruby's GC and the differences between 1.9 and 2.0 (which
is quite a lot, I hear) I won't be able to know why this is the case. If anyone
feels like giving an explanation or linking to something that explains this, I
would be happy to read it :)
# Wrapping up
The data is really interesting and there's probably a tonne more stuff you could do with it. If anyone wants a copy of the data, I'll happily try and get it to you. It's a few hundred megabytes and this blog is in a GitHub repo and hosted on Heroku, so I'll avoid committing it directly unless I absolutely have to!
So yeah. This is part of what my spare time has been spent doing recently, woo! It's a lot of fun, I totally recommend just dicking around with this kind of stuff. I always find I run into something interesting in one way or another.