The power of two random choices

2 years ago (brooker.co.za)

The underlying mathematical driver of the power of two random choices is a pretty amazing fact about the "n balls into n bins" random experiment.

If the balls are thrown independently and uniformly at random, the most loaded bin will have theta(log n) balls in it (almost certainly).

If you instead pick two bins uniformly and independently at random for each ball and throw the ball into the bin which has fewer balls in it so far, then the most loaded bin will have theta(loglog n) balls in it, i.e a massive asymptotic improvement.

  • The balls-into-bins problem comes up often in distribute systems engineering, and so it's super important to have the right intuition. I think a lot of people come in to the space assuming that random balls-into-bins is much more even than it really is (and so underestimating the value of better-than-random placement).

    Another interesting example is hash-based partitioning of data. It doesn't spread data out nearly as evenly as one might expect!

    (I wrote a blog post about that too: https://brooker.co.za/blog/2018/01/01/balls-into-bins.html)

    • I have found remembering that both the law of large numbers and the central limit theorem are dependent on the independent and identically distributed assumption.

      As an example, many computing systems are Markovian, which is memoryless with an exponential distribution.

      So the second part of I.I.D. doesn't hold. But may be close enough for your needs.

      2 replies →

The most interesting tidbit, I think, is cut off from the right side of the plot -- eventually with enough delay, 1 random choice is the winner, for exactly the same reason that 3 random choices wins for small delay, and 2 random choices wins for medium delay.

I kinda wonder now if the crossover points between best, 3, 2, and 1 are equidistant on a log-scale plot. If I squint a little, I could maybe see 4, 16, and 64 being the approximate crossover points, especially if there was a little less noise.

  • Yes, it's super interesting when systems move into the regime where stale information becomes worse than no information at all. Part of what makes best-of-2 interesting is how long it resists falling into this pit, but it does eventually happen (as, I suspect, it must with any distributed placement/load balancing algorithm).

The most interesting question this has always raised for me is whether it implies you should always try two doctors/dentists/therapists/etc. before making a choice. Seems like it would be just as powerful “in real life“.

  • There's a whole area of math in "optimal stopping" which aims at exactly this question: how many dentists should you try before you pick your favorite?

    For example: https://en.wikipedia.org/wiki/Secretary_problem

    • That problem is always sort of intuitively misleading. The success criterion is defined as picking the single best option from the field. But that's not how real life works. You don't have to pick the single best option for a job or spouse or dentist or whatever; you'll likely be satisfied with anything in the top ten or quintile or somewhere around there. It's interesting mathematically, but it's seldom really applicable to real life.

  • Well, this strategy would only help you find ones that are not over loaded? And probably only works if you can change choices over time?

    That said, sampling, in general, works surprisingly well in all things.

  • That reminds me of a great book -- "Algorithms to Live By" -- which examines precisely this kind of question. Useful, educational, and entertaining = highly recommended.

> Using stale data for load balancing leads to a herd behavior, where requests will herd toward a previously quiet host for much longer than it takes to make that host very busy indeed.

This phenomenon is common with cars using traffic-aware navigation, too.

> dedicated load balancers are often undesirable. They add cost, latency, complexity, and may introduce a single point of failure.

Not my experience at all. Centralised load balancer are simple, easy to maintain, and just works.

My rule of thumb is to always always always avoid distributed computations as much as possible. Trying to avoid a “single point of failure” always in my experience leads to more complex failure scenarios, making your system more likely to fail in hard to fix ways.

I used this method for a cache eviction/replacement scheme. For our use case it was better than traditional lru.