5 9's is like 7 minutes a year. They are breaking SLAs and impacting services people depend on
Tbh though this is sort of all the other companies fault, "everyone" uses aws and cf and so others follow. now not only are all your chicks in one basket, so is everyone elses. When the basket inevitably falls into a lake....
Providers need to be more aware of their global impact in outages, and customers need to be more diverse in their spread.
These kinds of outages continue to happen and continue to impact 50+% of the internet, yes, they know they have that power, but they dont treat changes as such, so no, they arent aware. Awareness would imply more care in operations like code changes and deployments.
Outages happen, code changes occur; but you can do a lot to prevent these things on a large scale, and they simply dont.
Where is the A/B deployment, preventing a full outage? What about internally, where was the validation before the change, was the testing run against a prodlike environment or something that once resembled prod but hasnt forever?
They could absolutely mitigate impacting the entire global infra in multiple ways, and havent, despite their many outages.
I do think this is tenable as long as these services are reliable. Even though there have been some outages I would argue that they’re incredibly reliable at this point. If though this ever changes the costs to move to a competitor won’t be as simple as pushing a repository elsewhere, especially for AWS. I think that’s where some of the potential danger lies.
> and judging by the HN post age, we're now past minute 60 of this incident.
Huh? It's been back up during most of this time. It was up and then briefly went back down again but it's been up for a while now. Total downtime was closer to 30 minutes
5 9's is like 7 minutes a year. They are breaking SLAs and impacting services people depend on
Tbh though this is sort of all the other companies fault, "everyone" uses aws and cf and so others follow. now not only are all your chicks in one basket, so is everyone elses. When the basket inevitably falls into a lake....
Providers need to be more aware of their global impact in outages, and customers need to be more diverse in their spread.
99.999% availability is around 5 minutes or so of downtime per year.
> Providers need to be more aware of their global impact in outages
So you think the problem is they aren't "aware"?
These kinds of outages continue to happen and continue to impact 50+% of the internet, yes, they know they have that power, but they dont treat changes as such, so no, they arent aware. Awareness would imply more care in operations like code changes and deployments.
Outages happen, code changes occur; but you can do a lot to prevent these things on a large scale, and they simply dont.
Where is the A/B deployment, preventing a full outage? What about internally, where was the validation before the change, was the testing run against a prodlike environment or something that once resembled prod but hasnt forever?
They could absolutely mitigate impacting the entire global infra in multiple ways, and havent, despite their many outages.
1 reply →
I do think this is tenable as long as these services are reliable. Even though there have been some outages I would argue that they’re incredibly reliable at this point. If though this ever changes the costs to move to a competitor won’t be as simple as pushing a repository elsewhere, especially for AWS. I think that’s where some of the potential danger lies.
> 30 minutes of downtime
> this is tenable as long as these services are reliable
do you hear yourself, this is supposed to be a distributed CDN. imagine if HTTP had 30 minutes of downtime a year.
and judging by the HN post age, we're now past minute 60 of this incident.
> and judging by the HN post age, we're now past minute 60 of this incident.
Huh? It's been back up during most of this time. It was up and then briefly went back down again but it's been up for a while now. Total downtime was closer to 30 minutes
3 replies →
> especially for AWS
CF can be just as difficult if not more to migrate off of especially when using things like durable objects