Comment by mjb

2 years ago

Round robin can work very well when the variance of task sizes is small, but degrades quickly when the variance of task sizes becomes large (i.e. there are some tasks that keep servers busy for minutes, and some for milliseconds).

In distributed systems with large numbers of producers (like one of the systems Mihir has worked, AWS Lambda) round robin essentially degrades into random placement as the number of producers increases.

Yea, order of mag differences in execution times gets messy behind load balancers.

We had an API that was a kitchen sink of different tools, including reporting. Eventually we split reporting off into its own service/LB group because you can get calls delayed that are returning 100 bytes of text being delayed by a few megabytes of data being assembled.