Comment by atq2119
2 years ago
The underlying mathematical driver of the power of two random choices is a pretty amazing fact about the "n balls into n bins" random experiment.
If the balls are thrown independently and uniformly at random, the most loaded bin will have theta(log n) balls in it (almost certainly).
If you instead pick two bins uniformly and independently at random for each ball and throw the ball into the bin which has fewer balls in it so far, then the most loaded bin will have theta(loglog n) balls in it, i.e a massive asymptotic improvement.
The balls-into-bins problem comes up often in distribute systems engineering, and so it's super important to have the right intuition. I think a lot of people come in to the space assuming that random balls-into-bins is much more even than it really is (and so underestimating the value of better-than-random placement).
Another interesting example is hash-based partitioning of data. It doesn't spread data out nearly as evenly as one might expect!
(I wrote a blog post about that too: https://brooker.co.za/blog/2018/01/01/balls-into-bins.html)
I have found remembering that both the law of large numbers and the central limit theorem are dependent on the independent and identically distributed assumption.
As an example, many computing systems are Markovian, which is memoryless with an exponential distribution.
So the second part of I.I.D. doesn't hold. But may be close enough for your needs.
However, in many real systems the aggregate behavior follows the asymptotic trend of the central limit theorem even though the underlying distributions may not fit the strict requirements.
1 reply →
[flagged]
On the very page you linked it is explained what the theta means. Ironic.
No need to be snarky.
I simply didn't know the notation and that's why I wanted to know if that was a mistake or intentional.
4 replies →
Big Theta describes a both and upper and lower asymptotic bound.
TIL; Thanks!
6 replies →