Power of k random choices

This is one of my favorite results of all times. A really simple neat trick that any developer can use in a variety of situations without really having to understand how it works. Think of the following hypothetical: you have a bunch of bins and a bunch of balls and you want to load up the bins as evenly as possible. If the bins and the balls were of same size, a round-robin strategy would work well. If they were not, you’d expect that just putting a ball in a bin at random would work well, however here we propose a different scheme that will perform better. Pick two bins at random and put the ball in the least loaded bin.

Why this performs better is out of the scope of this post, there are a lot of papers that discuss complex proofs on the topic. Instead lets take an example from the systems domain. You have N servers that accept web requests. Let us assume that there are m requests. Each server is stateless, so you can route each request to any server you want. Here are some options on how to do it: (1) Round-robin, (2) Route to a random server, (3) Route to the least loaded server, (4) Pick k servers at random and route to the least loaded one. The last approach will outperform the first three, especially when:

  1. Requests have different resource demands
  2. The server farm is heterogeneous
  3. The information about which server is least loaded is stale
  4. Any combination of the above

 shows the above strategy works better than simple random choice. Talwar and Wieder of MSR, show in their paper that in case #1 the load gap, meaning the difference between the average and most loaded server, does not depend on the number of requests and thus scales infinitely with the number of servers. Mitzenmacher in his paper, shows how increasing the number k of randomly selected servers can help with stale information about server load.
The proofs in the above papers can be daunting and the result itself may seem counter-intuitive, but it works well.

4 thoughts on “Power of k random choices

  1. Matt Faus

    One big benefit of options 1 and 2 is simplicity. Do the papers suggest any easy way to calculate comparable load figures in a heterogeneous system? Or must you always measure the maximum request/sec for each unique SKU and then extrapolate a percentage based on that?

  2. papercruncher Post author

    Matt, an MSR paper discusses the case of heterogeneous systems and shows that all results still hold — given sufficient number of choices. The author simulates such a system via an uneven sampling distribution. An easy hack to do so is to insert multiple servers in your pool for each more powerful server. For example, if you have 100 1TB SKUs and 100 2TB SKUs then model your system as 300 “virtual” 1TB machines with some of the virtual machines mapping to the same physical machine. Given that the system is now homogeneous, you can apply the same imbalance = (max – avg) metric

  3. Mike K Tung (@mikektung)

    Hey Marios, I didn’t examine the paper, but (4) can’t dominate (3) because they are equivalent in the case that k=n, right? For estimating which server is least loaded, nginx has a pretty good option in the latest build of the upstream module called ‘least_conn’ that works reasonably well.

  4. papercruncher Post author

    Hi Mike,

    The case where k=n can actually lead to herding when the server information is stale. The Mitzenmacher shows that and I’ve “proven” the same thing with simulations. I’m not a fan of the least connections algorithm especially for the case where not all calls carry the same weight. For example, if some requests take a long time to process and others don’t, the nginx algorithm fails spectacularly (unless of course you are over provisioned). The reason for that is because not every connection weighs the same so the information nginx is acting upon is not accurate.

    In real life, this algorithm is shown to work best when k << n

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s