Comment by rco8786
5 hours ago
Is it me or has there been a very noticeable uptick in large scale infra-level outages lately? AWS, Cloudflare, etc have all been way under whatever SLA they publish.
5 hours ago
Is it me or has there been a very noticeable uptick in large scale infra-level outages lately? AWS, Cloudflare, etc have all been way under whatever SLA they publish.
Coincidentally, large tech companies have been conducting mass layoffs and claim they're going to rely on AI much more to replace junior developers.
And they are offshoring roles to lower quality devs.
That does seem to be a coincidence, as the recent outages making headlines (including this one according to early reports) have been associated with huge traffic spikes. It seems DDoS are reaching a new level.
Maybe a laid-off engineer is bored and started orchestrating DDoS campaigns in their newly-found free time.
AWS's most recent blow-up was not a DDoS
[dead]
For me the only silver lining to all these cloud outages is now we know that their published SLA times mean absolutely nothing. The number of 9's used to at least give an indication of intent of reliability, now they are twisted to whatever metric the company wants to represent and dont actually represent guaranteed uptime anywhere.
So true. AWS for example gives only platform credits in the event of an outage. Basically no recourse or insurance.
Doesn’t everyone do that? I’ve never worked for a place that the base policy wasn’t credits. You might have special contract language stating otherwise, but for almost everyone, it’s credits.
Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.
None of the recent major outages were traced down to "vibe coding" or anything of the sort. They appear to be the kind of misconfigurations and networking fuckups that existed since Internet became more complex than 3 routers.
The "vibe thinking" trend where people stop using their brain and rely on whatever random output the LLM tells them is harder to diagnose, but it's certainly there and at least as bad as vibe coding.
1 reply →
How likely are we to know when a "misconfiguration or networking fuckup" is due to someone asking ChatGPT how to do the task?
>misconfigurations and networking fuckups that existed since Internet became more complex than 3 routers.
Yet there has been an uptick in frequency of outages only in the recent few months. Correlation correlation.
Why assume that these misconfigs are not the result of someone asking AI how to do them?
1 reply →
Wasn't the recent AWS a race condition that's existed since before vibe coding was a thing?
Speaking of "vibe-coding", I wonder how much their own outage is affecting their ability to vibe-code their way out of it.. :-)
The openai login page says:
> Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.
Likely this coupled with the mass brain damage caused by never-ending COVID re-infections.
Since vaccines don't prevent transmission, and each re-infection increases the chances of long COVID complications, the only real protection right now is wearing a proper respirator everywhere you go, and basically nobody is doing that anymore.
Are you being hyperbolic? It's clearly not this, and very likely not GP's proposal either.
1 reply →
The theory I’ve heard is holiday deploy freezes coupled with Q4 goals creates pressure to get things in quickly and early. It’s all been in the last month or so which does line up.
What's different about this Q4 vs the last 20 years of Q4s?
The obvious answer is to cancel holidays.
My theory is a state-sponsored actor targeting some of these services, but maybe that's just too 'tinfoil hat' of me, who knows.
There are usually very comprehensive post mortems for these events, and none have suggested that at all
This only amplifies the often-repeated propaganda about the "very powerful" enemies of democracy, who in fact are very fragile dictatorships. There's enough incompetence at tech companies to f up their own stuff.
My theory is DNS.
If it's any guidance, US cyber risk insurance (which covers among other things disruptions due to supplier outages) has continuously dropped in price since Q1 2023, with a handful of percent per year.
If you excuse the sloppy plot manually transcribed from market index data: https://i.xkqr.org/cyberinsurancecost.png
Unless you're making that determination statistically, it's probably pereidolia. See here: https://behavioralscientist.org/yates-expect-unexpected-why-...
I suspect the number of outages is the same, but the number of sites putting all of their eggs into these two baskets has grown considerably.
Don't forget Azure Front Door / half of Azure.
Yeah, but that's just standard for Azure.
Any chance our friend Vladamir is behind this?
GCP was down recently as well
It's you. Everything does down once in a while.
it definitely feels like it.
[dead]