Comment by abigail95

2 days ago

Who is trying to achieve zero downtime? Facebook has degraded service regularly it's just close enough to 99.9 that nobody cares.

If loading my messages times out I just move onto something else and go back a few minutes later.

Surely they have metrics measuring that and don't think it's worth the engineering effort to improve it.

One of the interesting things that came out of Google's "SRE" system is that they deliberately add outages if they don't have enough. They learned years ago that if you build a service that promises 99% uptime and deliver 99.99% uptime, other people in the company will come to depend on that 99.99% uptime unintentionally. So they chaos-monkey it to ensure that the inevitable failures aren't catastrophic.