← Back to context

Comment by atq2119

2 years ago

The underlying mathematical driver of the power of two random choices is a pretty amazing fact about the "n balls into n bins" random experiment.

If the balls are thrown independently and uniformly at random, the most loaded bin will have theta(log n) balls in it (almost certainly).

If you instead pick two bins uniformly and independently at random for each ball and throw the ball into the bin which has fewer balls in it so far, then the most loaded bin will have theta(loglog n) balls in it, i.e a massive asymptotic improvement.

The balls-into-bins problem comes up often in distribute systems engineering, and so it's super important to have the right intuition. I think a lot of people come in to the space assuming that random balls-into-bins is much more even than it really is (and so underestimating the value of better-than-random placement).

Another interesting example is hash-based partitioning of data. It doesn't spread data out nearly as evenly as one might expect!

(I wrote a blog post about that too: https://brooker.co.za/blog/2018/01/01/balls-into-bins.html)

  • I have found remembering that both the law of large numbers and the central limit theorem are dependent on the independent and identically distributed assumption.

    As an example, many computing systems are Markovian, which is memoryless with an exponential distribution.

    So the second part of I.I.D. doesn't hold. But may be close enough for your needs.

    • However, in many real systems the aggregate behavior follows the asymptotic trend of the central limit theorem even though the underlying distributions may not fit the strict requirements.

      1 reply →