Comment by rco8786
3 months ago
Is it me or has there been a very noticeable uptick in large scale infra-level outages lately? AWS, Cloudflare, etc have all been way under whatever SLA they publish.
3 months ago
Is it me or has there been a very noticeable uptick in large scale infra-level outages lately? AWS, Cloudflare, etc have all been way under whatever SLA they publish.
Coincidentally, large tech companies have been conducting mass layoffs and claim they're going to rely on AI much more to replace junior developers.
And they are offshoring roles to lower quality devs.
Interestingly, chatgpt was unavailable due to the same cloudflare outage.
Imagine vibe coding something in production, it breaks half the internet, then you can't vibe code it back because it broke the LLM providers. A real catch-22 for the modern age!
By similar thinking, you could blame large tech companies if they hired too many juniors.
Juniors, at least, have the capacity to learn.
That does seem to be a coincidence, as the recent outages making headlines (including this one according to early reports) have been associated with huge traffic spikes. It seems DDoS are reaching a new level.
AWS's most recent blow-up was not a DDoS
Maybe a laid-off engineer is bored and started orchestrating DDoS campaigns in their newly-found free time.
[dead]
For me the only silver lining to all these cloud outages is now we know that their published SLA times mean absolutely nothing. The number of 9's used to at least give an indication of intent of reliability, now they are twisted to whatever metric the company wants to represent and dont actually represent guaranteed uptime anywhere.
So true. AWS for example gives only platform credits in the event of an outage. Basically no recourse or insurance.
Doesn’t everyone do that? I’ve never worked for a place that the base policy wasn’t credits. You might have special contract language stating otherwise, but for almost everyone, it’s credits.
2 replies →
Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.
None of the recent major outages were traced down to "vibe coding" or anything of the sort. They appear to be the kind of misconfigurations and networking fuckups that existed since Internet became more complex than 3 routers.
The "vibe thinking" trend where people stop using their brain and rely on whatever random output the LLM tells them is harder to diagnose, but it's certainly there and at least as bad as vibe coding.
2 replies →
How likely are we to know when a "misconfiguration or networking fuckup" is due to someone asking ChatGPT how to do the task?
>misconfigurations and networking fuckups that existed since Internet became more complex than 3 routers.
Yet there has been an uptick in frequency of outages only in the recent few months. Correlation correlation.
Why assume that these misconfigs are not the result of someone asking AI how to do them?
1 reply →
Wasn't the recent AWS a race condition that's existed since before vibe coding was a thing?
Speaking of "vibe-coding", I wonder how much their own outage is affecting their ability to vibe-code their way out of it.. :-)
The openai login page says:
> Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.
Likely this coupled with the mass brain damage caused by never-ending COVID re-infections.
Since vaccines don't prevent transmission, and each re-infection increases the chances of long COVID complications, the only real protection right now is wearing a proper respirator everywhere you go, and basically nobody is doing that anymore.
Are you being hyperbolic? It's clearly not this, and very likely not GP's proposal either.
8 replies →
I have become dumber without having contracted covid or other respiratory diseases (which could have been covid). 2020s have been the era of fascism, war and communities getting torn, which does not really help with stress levels and intellectual performance.
The theory I’ve heard is holiday deploy freezes coupled with Q4 goals creates pressure to get things in quickly and early. It’s all been in the last month or so which does line up.
What's different about this Q4 vs the last 20 years of Q4s?
The obvious answer is to cancel holidays.
My theory is a state-sponsored actor targeting some of these services, but maybe that's just too 'tinfoil hat' of me, who knows.
There are usually very comprehensive post mortems for these events, and none have suggested that at all
This only amplifies the often-repeated propaganda about the "very powerful" enemies of democracy, who in fact are very fragile dictatorships. There's enough incompetence at tech companies to f up their own stuff.
My theory is DNS.
Somewhere, at a floating desk behind a wall of lava lamps, in a nyancatified ghostty terminal with 32 different shader plugins installed:
You're absolutely right! I shouldn't have force pushed that change to master. Let me try and roll it back. * Confrobulating* Oh no! Cloudflare appears to be down and I cannot revert the change. Why don't you go make a cup of coffee until that comes back. This code is production ready, it's probably just a blip.
If it's any guidance, US cyber risk insurance (which covers among other things disruptions due to supplier outages) has continuously dropped in price since Q1 2023, with a handful of percent per year.
If you excuse the sloppy plot manually transcribed from market index data: https://i.xkqr.org/cyberinsurancecost.png
Don't forget Azure Front Door / half of Azure.
Yeah, but that's just standard for Azure.
I suspect the number of outages is the same, but the number of sites putting all of their eggs into these two baskets has grown considerably.
Unless you're making that determination statistically, it's probably pereidolia. See here: https://behavioralscientist.org/yates-expect-unexpected-why-...
It's you. Everything does down once in a while.
GCP was down recently as well
Well AWS runs on Cloudflare...so thanks Cloudflare team!
Don’t forget that Azure was down two weeks ago as well.
Any chance our friend Vladamir is behind this?
it definitely feels like it.