07/27/23

Queueing

The least sexy aspect of building Event-Driven Architectures

9 Min Read

Queues are everywhere. We queue at bars, in restaurants, for elevators, and at the bank. When you loaded this web page, the request to fetch it interacted with dozens of different queues on its way from your machine to the server this page is hosted on. Queues are a fundamental part of life, and a fundamental part of computing.

In this post we're going to explore queuing in the context of HTTP requests. We'll start simple, gradually building up to more complex queueing strategies by showing the weaknesses of simpler queues.

# Why do we need queues?

Let's start at the beginning: one client and one server. For this to work, I'm going to need your help. You're going to play a central role in this post by being the client.

Click the circle button to send a request to the server.

When you clicked, a request travelled from the button to the server. When the request arrived at the server, the server began processing it. The request shrinks in a clock-like fashion until it has disappeared, at which point it is considered served. If you click fast enough you will notice that the server can only process one request at a time, and requests that arrive while the server is busy get dropped. Give it a try if you haven't already.

If you're finding it difficult to click fast enough to see dropped requests, you can slow down the animation speed using the control below. You can also speed the animations up if they feel too slow.

These dropped requests are a problem. If these are HTTP requests, dropping them would mean showing an error page to the client. This could cause the client to leave your site, which could be a lost signup or a lost sale. We want to avoid dropping requests if we can.

One of the ways we could tackle this problem is to introduce a queue. The next example puts a queue between you and the server. Send a few requests and see what happens.

New requests are temporarily highlighted to make it easy to see where they end up in the queue.

Now when a request arrives at the server and the server is busy, the request is queued. The server takes requests off the queue in the order they were added, processing them at a steady rate.

# FIFO

The queue we used in the previous example is what's called a first-in first-out, or FIFO, queue. As the name suggests, requests are processed in the order they were added.

Queues aren't a silver bullet. They help absorb bursts of requests, but they can still get full. If a request arrives and the queue is full, we still drop that request.

Even worse, we have introduced a new problem. Without the queue, requests that cannot be served get dropped almost immediately. This is bad, but at least you learn quickly that things aren't working. Now, with the queue, the request spends time waiting before being served. We've traded fast failure for slow success, and sometimes slow failure.

In the next example, I'm going to need you to play the part of a client with high standards. If a request takes longer than 3 seconds to be served, you consider it failed and give up on it. Practically, this would be an action like closing the browser tab. We will represent this by turning the request hollow.

Things start off okay, but you will see that if you fill up the queue, requests will hit that 3 second timeout. When a request gets to the front of the queue after timing out, the server wastes time processing it. This then increases the chance that the request behind it will also time out, creating a cascade of failures.

If you send requests faster than they can be processed, you fall into a state where the server is only processing timed out requests. No useful work is getting done. This highlights an important rule for using FIFO queues: keep them short enough that you can process the full backlog in less time than it takes for a request to time out.

# LIFO

One thing we can do to lessen the problem we saw with FIFO queues in the previous section is to use a last-in first-out, or LIFO, queue. In a LIFO queue, the most recently added request is processed first. This is the opposite of FIFO, where the least recently added request is processed first.

Here we see that when the queue gets full, we still drop requests, but the requests that are served are ones that haven't timed out. The work that the server is doing isn't wasted.

The downside of LIFO is that you can have requests stuck in the queue for a very long time. If you tap the button at a steady rate, you'll see that you can keep a set of requests at the back of the queue stuck for as long as you want.

# Priority queues

Something you may want to do is to treat some requests as higher priority than others. Maybe you want to make sure that your paying customers get served before your free-tier ones, or you know that some requests happen in the background and can be dropped without the customer noticing.

One way to achieve this is to use a "priority queue." The way priority queues work is that when a new request arrives, it is added to the queue in a position that is determined by its priority. Requests with a higher priority get added closer to the front, instead of at the end. When the server is ready to serve a request, it takes the highest priority request from the front of the queue.

In the following example, we introduce the idea of high priority requests. These are visually distinct from requests by colour and by the use of horizontal stripes. High priority requests jump the queue, getting placed at the front when they arrive.

Try adding a few requests and then a high priority request. What happens when you try to add a high priority request when the queue is full?

Notice that high priority requests get added to the front of the queue. This ensures that they are served first, and don't wait as long as low priority requests. When the queue is full, however, we still have to drop high priority requests.

# Active queue management

What we'd like to be able to do is push low priority requests out of the queue to make room when a high priority request arrives. This is where "active queue management" (AQM) comes in.

Up until now, we've only dropped requests that arrive when the queue is full. This is called "tail drop," because we're dropping the tail end of the queue. This is a simple strategy, but it has some problems. One of the problems is that we drop high priority requests.

To get this problem, we're going to drop requests before the queue gets full. I know, it sounds counterintuitive, but bear with me. Every time a new low priority request arrives, we are going to check how many requests are already in the queue. For every request currently in the queue, add 20% to the likelihood that the new request will be dropped. So when the queue is empty, the new request has a 0% chance of being dropped. When the queue has 3 requests in it, the new request has a 60% chance of being dropped.

We will only drop high priority requests when the queue is full. Give it a try!

It may take a few goes to see this in action, but you should notice that as the queue gets more full, more requests get dropped, even though the queue has space in it. However, you'll also notice that high priority requests are only dropped when the queue is full.

This is called "random early detection" (RED), and it's trading off dropping more low priority requests in return for dropping fewer high priority requests. Best of all, it's cheap and simple to implement. Because of this, RED is a commonly used AQM algorithm in the wild. Technically speaking, because we're using different probabilities for low priority and high priority, what we're using here is called "weighted random early detection" (WRED).

Another problem RED solves is TCP global synchronization, which is beyond the scope of this post but I encourage you to read up on it if this is a topic you're interested in!

# Comparing queues

We've spoken about lots of different queues and queuing strategies, let's see how they compare to each other.

In this example, we see all four different types of queues we've explored throughout this post. From top to bottom: FIFO, LIFO, priority, and priority with RED. They're all receiving identical requests at the same rate. Let's see how they perform.

TODO: visualise performance.