Comment by magicalhippo

2 years ago

Maybe I misunderstood something, I admit I glossed over parts of the article, but the explanations seemed a bit over-complicated.

In the balls and bucket scenario, ideally you'd want to scan all the buckets and put the ball in the least filled one.

By picking two random buckets and putting the ball in the least filled of the two, you approximate the ideal algorithm.

The chance of picking two relatively filled buckets is inversely proportional to how "bad" it is. If there are just two buckets which have more balls than the others then it's relatively bad to pick those two, but chances are low. And vice versa.

Still, hadn't thought about this before, interesting trick indeed!

10 comments

magicalhippo

mjb 2 years ago

Two random choices has a few significant advantages over "find the least filled one":

- It's O(1), rather than O(buckets).

- It's robust to stale data and concurrency. One way systems try to avoid the O(buckets) work is by caching, or by scanning in parallel, either of which cause stale load estimates. "Pick the best" tends to lead to significant over-filling http://www.eecs.harvard.edu/~michaelm/postscripts/handbook20... or https://brooker.co.za/blog/2012/01/17/two-random.html).

- Best-of-k (rather than best-of-two) provides a parameter `k` that allows the system to tradeoff between optimality (higher k is more optimal), and robustness to stale data and concurrent accesses (lower k is more robust). Mitzenmacher's core observation is that the step up from k=1 to k=2 is a massive improvement in optimality with only a small loss in robustness (and much more than most people would expect).

potamic 2 years ago

They talked about doing this for AWS lambda at "gigantic scale". I assume at some point scanning all the buckets becomes infeasible.

movpasd 2 years ago

I think it comes down to complexity. Scanning is O(N) where N is the number of buckets, two random choices is O(1) (if bucket lookup is O(1), e.g. a hash map).

magicalhippo 2 years ago

Yes exactly. You approximate the full algorithm while not paying the overhead.

DylanSp 2 years ago

It felt over-complicated to me, too. Your explanation is significantly more concise and is a lot clearer to me

xvector 2 years ago

Do you have to pick two random buckets or just one random bucket and one adjacent bucket?

Feels like picking an adjacent bucket would work just as well and increase locality.

rjmunro 2 years ago

In a followup post he explains why actually almost the opposite is true - split your buckets into 2 groups, pick a random A and B from each group and if it's a draw, always load A first. This means buckets in the B group will likely be 1 less loaded than servers in the A group.
https://medium.com/@mihsathe/load-balancing-a-very-counterin...
magicalhippo 2 years ago
I was thinking about that, but I haven't had time to really dig into it. My gut feeling is that it won't be sufficient, but it might be a bad hunch.
- xvector 2 years ago
  
  I graphed it: https://rentry.co/nkrc6
  There's a difference, but it's minimal. Only the initial choice has to be random. As long as you have more than one bin to pick from, you are good regardless of your strategy (random vs adjacent).
  
  1 reply →