Comment by dragonwriter
14 days ago
> The only time exponential backoff is useful is if the failure is due to a rate limit and you specifically need a mechanism to reduce the rate at which you are attempting to use it.
Exponential backoff is applicable to any failure where the time it has so far gone unresolved is the primary piece of available data on how lilong it is likely to take before being resolved, which is a very common situation, which is why it is a good default for most situations where you don’t have a better knowable-in-advance information at hand and the probability distribution ofn time to resolve, and where delays aren't super costly (though knowledge of when delays become costly can be used to set a cap on exponential backoff, too.)
> and where delays aren't super costly
This is key.
If delays aren't costly, sure, any algorithm is fine.
But if you want your service up as soon as possible, why spend pointless minutes calling sleep() when you could be getting things working sooner?
> If delays aren't costly, sure, any algorithm is fine.
Not true, delays aren't the only costs. Compute, network, and developer time digging through logs all cost money, and hammering a service fruitlessly when it is down adds to all of those.
(It also doesn't help that “system is overloaded” may sometimes be communicated clearly, but also in many systems can manifest in... just about any other kind of error, too.)