The power of two random choices

2 years ago (brooker.co.za)

34 comments

mfiguiere

The underlying mathematical driver of the power of two random choices is a pretty amazing fact about the "n balls into n bins" random experiment.

If the balls are thrown independently and uniformly at random, the most loaded bin will have theta(log n) balls in it (almost certainly).

If you instead pick two bins uniformly and independently at random for each ball and throw the ball into the bin which has fewer balls in it so far, then the most loaded bin will have theta(loglog n) balls in it, i.e a massive asymptotic improvement.

mjb 2 years ago
The balls-into-bins problem comes up often in distribute systems engineering, and so it's super important to have the right intuition. I think a lot of people come in to the space assuming that random balls-into-bins is much more even than it really is (and so underestimating the value of better-than-random placement).
Another interesting example is hash-based partitioning of data. It doesn't spread data out nearly as evenly as one might expect!
(I wrote a blog post about that too: https://brooker.co.za/blog/2018/01/01/balls-into-bins.html)
- nyrikki 2 years ago
  
  I have found remembering that both the law of large numbers and the central limit theorem are dependent on the independent and identically distributed assumption.
  As an example, many computing systems are Markovian, which is memoryless with an exponential distribution.
  So the second part of I.I.D. doesn't hold. But may be close enough for your needs.
  
  2 replies →
littlestymaar 2 years ago
[flagged]
- orlp 2 years ago
  
  On the very page you linked it is explained what the theta means. Ironic.
  
  5 replies →
- ericpauley 2 years ago
  
  Big Theta describes a both and upper and lower asymptotic bound.
  
  7 replies →

Zanni 2 years ago

Nice animated visualization of this: https://twitter.com/GrantSlatton/status/1754912113246798036

koolba 2 years ago

That’s neat. An easy concept to understand here is that with a pair to consider you will never target the worst node. At worst it’s the second most loaded node.

dahart 2 years ago

The most interesting tidbit, I think, is cut off from the right side of the plot -- eventually with enough delay, 1 random choice is the winner, for exactly the same reason that 3 random choices wins for small delay, and 2 random choices wins for medium delay.

I kinda wonder now if the crossover points between best, 3, 2, and 1 are equidistant on a log-scale plot. If I squint a little, I could maybe see 4, 16, and 64 being the approximate crossover points, especially if there was a little less noise.

mjb 2 years ago

Yes, it's super interesting when systems move into the regime where stale information becomes worse than no information at all. Part of what makes best-of-2 interesting is how long it resists falling into this pit, but it does eventually happen (as, I suspect, it must with any distributed placement/load balancing algorithm).

zellyn 2 years ago

The most interesting question this has always raised for me is whether it implies you should always try two doctors/dentists/therapists/etc. before making a choice. Seems like it would be just as powerful “in real life“.

mjb 2 years ago
There's a whole area of math in "optimal stopping" which aims at exactly this question: how many dentists should you try before you pick your favorite?
For example: https://en.wikipedia.org/wiki/Secretary_problem
- vikingerik 2 years ago
  
  That problem is always sort of intuitively misleading. The success criterion is defined as picking the single best option from the field. But that's not how real life works. You don't have to pick the single best option for a job or spouse or dentist or whatever; you'll likely be satisfied with anything in the top ten or quintile or somewhere around there. It's interesting mathematically, but it's seldom really applicable to real life.
taeric 2 years ago

Well, this strategy would only help you find ones that are not over loaded? And probably only works if you can change choices over time?
That said, sampling, in general, works surprisingly well in all things.
chrisweekly 2 years ago

That reminds me of a great book -- "Algorithms to Live By" -- which examines precisely this kind of question. Useful, educational, and entertaining = highly recommended.

amluto 2 years ago

> Using stale data for load balancing leads to a herd behavior, where requests will herd toward a previously quiet host for much longer than it takes to make that host very busy indeed.

This phenomenon is common with cars using traffic-aware navigation, too.

ckcheng 2 years ago

Another place this technique was used and discussed recently:

Caches: LRU vs. Random (2014) (danluu.com)

https://news.ycombinator.com/item?id=39093109

Night_Thastus 2 years ago

It feels subjectively like Caching and Load Balancing both belong to the same 'class' of problems at least at some level. I'm curious if there's a term that encompasses both.

deterministic 2 years ago

> dedicated load balancers are often undesirable. They add cost, latency, complexity, and may introduce a single point of failure.

Not my experience at all. Centralised load balancer are simple, easy to maintain, and just works.

My rule of thumb is to always always always avoid distributed computations as much as possible. Trying to avoid a “single point of failure” always in my experience leads to more complex failure scenarios, making your system more likely to fail in hard to fix ways.

mkhnews 2 years ago

I used this method for a cache eviction/replacement scheme. For our use case it was better than traditional lru.