Comment by weird-eye-issue

3 months ago

Oh no, we had 30 minutes of downtime this year :(

12 comments

weird-eye-issue

5 9's is like 7 minutes a year. They are breaking SLAs and impacting services people depend on

Tbh though this is sort of all the other companies fault, "everyone" uses aws and cf and so others follow. now not only are all your chicks in one basket, so is everyone elses. When the basket inevitably falls into a lake....

Providers need to be more aware of their global impact in outages, and customers need to be more diverse in their spread.

world2vec 3 months ago

99.999% availability is around 5 minutes or so of downtime per year.
weird-eye-issue 3 months ago
> Providers need to be more aware of their global impact in outages
So you think the problem is they aren't "aware"?
- CableNinja 3 months ago
  
  These kinds of outages continue to happen and continue to impact 50+% of the internet, yes, they know they have that power, but they dont treat changes as such, so no, they arent aware. Awareness would imply more care in operations like code changes and deployments.
  Outages happen, code changes occur; but you can do a lot to prevent these things on a large scale, and they simply dont.
  Where is the A/B deployment, preventing a full outage? What about internally, where was the validation before the change, was the testing run against a prodlike environment or something that once resembled prod but hasnt forever?
  They could absolutely mitigate impacting the entire global infra in multiple ways, and havent, despite their many outages.
  
  1 reply →

pell 3 months ago

I do think this is tenable as long as these services are reliable. Even though there have been some outages I would argue that they’re incredibly reliable at this point. If though this ever changes the costs to move to a competitor won’t be as simple as pushing a repository elsewhere, especially for AWS. I think that’s where some of the potential danger lies.

swyx 3 months ago
> 30 minutes of downtime
> this is tenable as long as these services are reliable
do you hear yourself, this is supposed to be a distributed CDN. imagine if HTTP had 30 minutes of downtime a year.
and judging by the HN post age, we're now past minute 60 of this incident.
- weird-eye-issue 3 months ago
  
  > and judging by the HN post age, we're now past minute 60 of this incident.
  Huh? It's been back up during most of this time. It was up and then briefly went back down again but it's been up for a while now. Total downtime was closer to 30 minutes
  
  3 replies →
weird-eye-issue 3 months ago

> especially for AWS
CF can be just as difficult if not more to migrate off of especially when using things like durable objects