Comment by digaozao
2 years ago
I liked the approach. I did not know it. But what are the advantages against taking the least occupied server/bin always? I have seen good improvements with nginx load balancing when changing its configuration to least_conn, that chooses the server with least connections.
> what are the advantages against taking the least occupied server/bin always
The other answers are saying for algo reasons. Sure, but, pick two is even more better for uptime reasons.
In serving at scale, quite often the "least occupied" server of all of them is one with an error causing it to have least load/connections/whatever.
By starting with a random two, you pick among probably healthy servers, and then the least load.
> good improvements with nginx load balancing when changing its configuration to least_conn, that chooses the server with least connections.
nginx doesn't know why your app is failing to keep connections open long enough to serve them, least_conn between two random results in far fewer 100% "outages" when a single box or subset of boxes among 10s or 1000s is bad.
Wouldn't that be better solved with health checking and removing the poor performing server from the pool?
Not my field but I'm guessing when you have thousands of servers it can be quite the overhead to keeping track of the load of each one for each request.
The proposed algorithm only needs to consider two, and from what I can gather should work in parallel with multiple load balancers without coordination.
It’s more costly to determine the server with absolute least connections since you have to keep track of the ordering of the servers by how many connections they have.
See my reply here: https://news.ycombinator.com/item?id=37175685
Short version: it's robust to staleness and concurrency, which is important in distributed settings ("find the best" is hard to do when "best" is a moving target).
In large enough systems, it becomes prohibitively expensive to keep consensus of the state of all members.
Sure with 10 or so hosts it’s trivial to select the least busy one but at the scale of 1000s of hosts you begin to run into a ton overhead just to decide where to forward a request.
As many others have said, O(N) for each request vs O(1) while avoiding hot spots.
O(log n) vs O(1)
Is there a way to keep a list with changing values sorted in real time in order to use a log n search algorithm?
Yes, a binary search tree that is dynamically balanced, for example red-black or avl.
It's log n either way.