Comment by deathanatos

4 years ago

> 2) What's the benefit of multiple AZs if the SLA of a single AZ is greater than your intended availability goals? (Have you checked your provider's single AZ SLA?)

… my providers single AZ SLA is less than my company's intended availability goals.

(IMO our goals are also nuts, too, but it is what it is.)

Our provider, in the worse case (a VM using a managed hard disk) has an SLA of 95% within a month (I … think. Their SLA page uses incorrect units on the top line items. The examples in the legalese — examples are normative, right? — use a unit of % / mo…).

You're also assuming a provider a.) typically meets their SLAs and b.) if they don't, honors them. IME, (a) is highly service dependent, with some services being just stellar at it, and (b) is usually "they will if you can prove to them with your own metrics they had an outage, and push for a credit. Also (c.) the service doesn't fail in a way that's impactful, but not covered by SLA. (E.g., I had a cloud provider once whose SLA was over "the APIs should return 2xx", and the APIs during the outage, always returned "2xx, I'm processing your request". You then polled the API and got "2xx your request is pending". Nothing was happening, because they were having an outage, but that outage could continue indefinitely without impacting the SLA! That was a fun support call…)

There's also (d) AZs are a myth; I've seen multiple global outages. E.g., when something like the global authentication service falls over and takes basically every other service with it. (Because nothing can authenticate. What's even better is the provider then listing those services as "up" / not in an outage, because technically it's not that service that's down, it is just the authentication service. Cause God forbid you'd have to give out that credit. But the provider calling a service "up" that is failing 100% of the requests sent its way is just rich, from the customer's view.)

0 comments

deathanatos

No comments yet

Contribute on Hacker News ↗