Memory Allocation
samwho keyboard logo

Memory Allocation

One thing that all programs on your computer have in common is a need for memory. Programs need to be loaded from your hard drive into memory before they can be run. While running, the majority of what programs do is load values from memory, do some computation on them, and then store the result back in memory.

In this post I'm going to introduce you to the basics of memory allocation. Allocators exist because it's not enough to have memory available, you need to use it effectively. We will visually explore how simple allocators work. We'll see some of the problems that they try to solve, and some of the techniques used to solve them. At the end of this post, you should know everything you need to know to write your own allocator.

# malloc and free

To understand the job of a memory allocator, it's essential to understand how programs request and return memory. malloc and free are functions that were first introduced in a recognisable form in UNIX v7 in 1979(!). Let's take a look at a short C program demonstrating their use.

Woah, hold on. I've never written any C code before. Will I still be able to follow along?

If you have beginner-level familiarity with another language, e.g. JavaScript, Python, or C#, you should have no problem following along. You don't need to understand every word, as long as you get the overall idea. This is the only C code in the article, I promise.

#include <stdlib.h>

int main() {
  void *ptr = malloc(4);
  free(ptr);
  return 0;
}

In the above program we ask for 4 bytes of memory by calling malloc(4), we store the value returned in a variable called ptr, then we indicate that we're done with the memory by calling free(ptr).

These two functions are how almost all programs manage the memory they use. Even when you're not writing C, the code that is executing your Java, Python, Ruby, JavaScript, and so on make use of malloc and free.

# What is memory?

The smallest unit of memory that allocators work with is called a "byte." A byte can store any number between 0 and 255. You can think of memory as being a long sequence of bytes. We're going to represent this sequence as a grid of squares, with each square representing a byte of memory.

In the C code from before, malloc(4) allocates 4 bytes of memory. We're going to represent memory that has been allocated as darker squares.

Then free(ptr) tells the allocator we're done with that memory. It is returned back to the pool of available memory.

Here's what 4 malloc calls followed by 4 free calls looks like. You'll notice there's now a slider. Dragging the slider to the right advances time forward, and dragging it left rewinds. You can also click anywhere on the grid and then use the arrow keys on your keyboard, or you can use the left and right buttons. The ticks along the slider represent calls to malloc and free.

Wait a sec... What is malloc actually returning as a value? What does it mean to "give" memory to a program?

What malloc returns is called a "pointer" or a "memory address." It's a number that identifies a byte in memory. We typically write addresses in a form called "hexadecimal." Hexadecimal numbers are written with a 0x prefix to distinguish them from decimal numbers. Move the slider below to see a comparison between decimal numbers and hexadecimal numbers.

0 == 0x0

Here's our familiar grid of memory. Each byte is annotated with its address in hexadecimal form. For space reasons, I've omitted the 0x prefix.

The examples we use in this article pretend that your computer only has a very small amount of memory, but in real life you have billions of bytes to work with. Real addresses are much larger than what we're using here, but the idea is exactly the same. Memory addresses are numbers that refer to a specific byte in memory.

# The simplest malloc

The "hello world" of malloc implementations would hand out blocks of memory by keeping track of where the previous block ended and starting the next block right after. Below we represent where the next block should start with a grey square.

You'll notice no memory is freed. If we're only keeping track of where the next block should start, and we don't know where previous blocks start or end, free doesn't have enough information to do anything. So it doesn't. This is called a "memory leak" because, once allocated, the memory can never be used again.

Believe it or not, this isn't a completely useless implementation. For programs that use a known amount of memory, this can be a very efficient strategy. It's extremely fast and extremely simple. As a general-purpose memory allocator, though, we can't get away with having no free implementation.

# The simplest general-purpose malloc

In order to free memory, we need to keep better track of memory. We can do this by saving the address and size of all allocations, and the address and size of blocks of free memory. We'll call these an "allocation list" and a "free list" respectively.

We're representing free list entries as 2 grey squares linked together with a line. You can imagine this entry being represented in code as address=0 and size=32. When our program starts, all of memory is marked as free. When malloc is called, we loop through our free list until we find a block large enough to accommodate it. When we find one, we save the address and size of the allocation in our allocation list, and shrink the free list entry accordingly.

Where do we save allocations and free list entries? Aren't we pretending our computer only has 32 bytes of memory?

You caught me. One of the benefits of being a memory allocator is that you're in charge of memory. You could store your allocation/free list in a reserved area that's just for you. Or you could store it inline, in a few bytes immediately preceding each allocation. For now, assume we have reserved some unseen memory for ourselves and we're using it to store our allocation and free lists.

So what about free? Because we've saved the address and size of the allocation in our allocation list, we can search that list and move the allocation back in to the free list. Without the size information, we wouldn't be able to do this.

Our free list now has 2 entries. This might look harmless, but actually represents a significant problem. Let's see that problem in action.

We allocated 8 blocks of memory, each 4 bytes in size. Then we freed them all, resulting in 8 free list entries. The problem we have now is that if we tried to do a malloc(8), there are no items in our free list that can hold 8 bytes and the malloc(8) will fail.

To solve this, we need to do a bit more work. When we free memory, we should make sure that if the block we return to the free list is next to any other free blocks, we combine them together. This is called "coalescing."

Much better.

# Fragmentation

A perfectly coalesced free list doesn't solve all of our problems. The following example shows a longer sequence of allocations. Have a look at the state memory is in at the end.

We end this sequence with 6 of our 32 bytes free, but they're split into 2 blocks of 3 bytes. If we had to service a malloc(6), while we have enough free memory in theory, we wouldn't be able to. This is called "fragmentation."

Couldn't we rearrange the memory to get a block of 6 contiguous bytes? Some sort of defragmentation process?

Sadly not. Remember earlier we talked about how the return value of malloc is the address of a byte in memory? Moving allocations won't change the pointers we have already returned from malloc. We would change the value those pointers are pointed at, effectively breaking them. This is one of the downsides of the malloc/free API.

If we can't move allocations after creating them, we need to be more careful about where we put them to begin with.

One way to combat fragmentation is, confusingly, to overallocate. If we always allocate a minimum of 4 bytes, even when the request is for 1 byte, watch what happens. This is the exact same sequence of allocations as above.

Now we can service a malloc(6). It's worth keeping in mind that this is just one example. Programs will call malloc and free in very different patterns depending on what they do, which makes it challenging to design an allocator that always performs well.

After the first malloc, the start of the free list seems to fall out of sync with allocated memory. Is that a bug in the visualisation?

No, that's a side-effect of overallocating. The visualisation shows "true" memory use, whereas the free list is updated from the allocator's perspective. So when the first malloc happens, 1 byte of memory is allocated but the free list entry is moved forward 4 bytes. We trade some wasted space in return for less fragmentation.

It's worth noting that this unused space that results from overallocation is another form of fragmentation. It's memory that cannot be used until the allocation that created it is freed. As a result, we wouldn't want to go too wild with overallocation. If our program only ever allocated 1 byte at a time, for example, we'd be wasting 75% of all memory.

Another way to combat fragmentation is to segment memory into a space for small allocations and a space for big ones. In this next visualisation we start with two free lists. The lighter grey one is for allocations 3 bytes or smaller, and the darker grey one is for allocations 4 bytes or larger. Again, this is the exact same sequence of allocations as before.

Nice! This also reduces fragmentation. If we're strictly only allowing allocations of 3 bytes or less in the first segment, though, then we can't service that malloc(6). The trade-off here is that reserving a segment of memory for smaller allocations gives you less memory to work with for bigger ones.

Hey, the first allocation in the dark grey free list is 3 bytes! You said this was for allocations 4 bytes and up. What gives?

Got me again. This implementation I've written will put small allocations in the dark grey space when the light grey space is full. It will overallocate when it does this, otherwise we'd end up with avoidable fragmentation in the dark grey space thanks to small allocations.

Allocators that split memory up based on the size of allocation are called "slab allocators." In practice they have many more size classes than the 2 in our example.

# A quick malloc puzzle

What happens if you malloc(0)? Have a think about this before playing with the slider below.

This is using our free list implementation that mandates a minimum size of 4 bytes for allocations. All memory gets allocated, but none is actually used. Do you think this is correct behaviour?

It turns out that what happens when you malloc(0) differs between implementations. Some of them behave as above, allocating space they probably didn't have to. Others will return what's called a "null pointer", a special pointer that will crash your program if you try to read or write the memory it points to. Others pick one specific location in memory and return that same location for all calls to malloc(0), regardless how many times it is called.

Moral of the story? Don't malloc(0).

# Inline bookkeeping

Remember earlier on when you asked about where allocation list and free list information gets stored, and I gave an unsatisfying answer about how it's stored in some other area of memory we've reserved for ourselves?

Yes...

This isn't the only way to do it. Lots of allocators store information right next to the blocks of memory they relate to. Have a look at this.

What we have here is memory with no allocations, but free list information stored inline in that memory. Each block of memory, free or used, gets 3 additional bytes of bookkeeping information. If address is the address of the first byte of the allocation, here's the layout of a block:

  1. address + 0 is the size of the block
  2. address + 1 is whether the block is free (1) or used (2)
  3. address + 2 is where the usable memory starts
  4. address + 2 + size -- the size of the block again

So in this above example, the byte at 0x0 is storing the value 29. This means it's a block containing 29 bytes of memory. The value 1 at 0x1 indicates that the block is free memory.

We store the size twice? Isn't that wasteful?

It seems wasteful at first, but it is necessary if we want to do any form of coalescing. Let's take a look at an example.

Here we've allocated 4 bytes of memory. To do this, our malloc implementation starts at the beginning of memory and checks to see if the block there is used. It knows that at address + 1 it will find either a 1 or a 2. If it finds a 1, it can check the value at address for how big the block is. If it is big enough, it can allocate into it. If it's not big enough, it knows it can add the value it finds in address to address to get to the start of the next block of memory.

This has resulted in the creation of a used block (notice the 2 stored in the 2nd byte), and it has pushed start of the free block forward by 7 bytes. Let's do the same again and allocate another 4 bytes.

Next, let's free our first malloc(4). The implementation of free is where storing information inline starts to shine. In our previous allocators, we had to search the allocation list to know the size of the block being freed. Now we know we'll find it at address. What's better than that is that for this free, we don't even need to know how big the allocation is. We can just set address + 1 to 1!

How great is that? Simple, fast.

What if we wanted to free the 2nd block of used memory? We know that we want to coalesce to avoid fragmentation, but how do we do that? This is where the seemingly wasteful bookkeeping comes into play.

When we coalesce, we check to see the state of the blocks immediately before and immediately after the block we're freeing. We know that we can get to the next block by adding the value at address to address, but how do we get to the previous block? We take the value at address - 1 and subtract that from address. Without this duplicated size information at the end of the block, it would be impossible to find the previous block and impossible to coalesce properly.

Allocators that store bookkeeping information like this alongside allocations are called "boundary tag allocators."

What's stopping a program from modifying the bookkeeping information? Wouldn't that completely break memory?

Surprisingly, nothing truly prevents this. We rely heavily, as an industry, on the correctness of code. You might have heard of "buffer overrun" or "use after free" bugs before. These are when a program modifies memory past the end of an allocated block, or accidentally uses a block of memory after freeing it. These are indeed catastrophic. They can result in your program immediately crashing, they can result in your program crashing in several minutes, hours, or days time. They can even result in hackers using the bug to gain access to systems they shouldn't have access to.

We're seeing a rise in popularity of "memory safe" languages, for example Rust. These languages invest a lot in making sure it's not possible to make these types of mistake in the first place. Exactly how they do that is outside of the scope of this article, but if this interests you I highly recommend giving Rust a try.

You might have also realised that calling free on a pointer that's in the middle of a block of memory could also have disastrous consequences. Depending on what values are in memory, the allocator could be tricked into thinking it's freeing something but what it's really doing is modifying memory it shouldn't be.

To get around this, some allocators inject "magic" values as part of the bookkeeping information. They store, say, 0x55 at address + 2. This would waste an extra byte of memory per allocation, but would allow them to know when a mistake has been made. To reduce the impact of this, allocators often disable this behaviour by default and allow you to enable it only when you're debugging.

# Playground

If you're keen to take your new found knowledge and try your hand at writing your own allocators, you can click here to go to my allocator playground. You'll be able to write JavaScript code that implements the malloc/free API and visualise how it works!

# Conclusion

We've covered a lot in this post, and if it has left you yearning for more you won't be disappointed. I've specifically avoided the topics of virtual memory, brk vs mmap, the role of CPU caches, and the endless tricks real malloc implementations pull out of their sleeves. There's no shortage of information about memory allocators on the Internet, and if you've read this far you should be well-placed to dive in to it.

Join the discussion on Hacker News!

# Acknowledgments

Special thanks to the following people:

powered by buttondown