Comment by lordofgibbons

3 months ago

How did we get to a place where either Cloudflare or AWS having an outage means a large part of the web going down? This centralization is very worrying.

300 comments

lordofgibbons

afavour 3 months ago

Because no one cares enough, including users.

Oddly this centralization allows a complete deferral of blame without you even doing anything: if you’re down, that’s bad. But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.

It also reduces your incentive to change, if “the internet is down” people will put down their device and do something else. Even if your web site is up they’ll assume it isn’t.

I’m not saying this is a good thing but I’m simply being realistic about why we ended up where we are.

marticode 3 months ago
As a user I do care, because I waste so much time on Cloudflare's "prove you are human" blocking-page (why do I have to prove it over and over again?), and frequently run on websites blocking me entirely based on some bad IP-blacklist used along with Cloudflare.
- tempest_ 3 months ago
  
  Unfortunately the internet sucks in 2025.
  If you have a site with valuable content the LLM crawlers hound you to no end. CF is basically a protection racket at this point for many sites. It doesnt even stop the more determined ones but it keeps some away.
  
  35 replies →
- woooooo 3 months ago
  
  I just realized, why don't they have some "definitely human" third party cookie that caches your humanness for 24h or so? I'm sure there's a reason, I've heard third party cookies were less respected now, but can someone chime in on why this doesn't work and save a ton of compute?
  
  7 replies →
- crazygringo 3 months ago
  
  But that's not a problem caused by Cloudflare.
  That's a problem caused by bots and spammers and DDoSers, that Cloudflare is trying to alleviate.
  And you generally don't have to prove it over and over again unless there's a high-risk signal associated with you, like you're using a VPN or have cookies disabled, etc. Which are great for protecting your privacy, but then obviously privacy means you do have to keep demonstrating you're not a bot.
  
  6 replies →
- edm0nd 3 months ago
  
  Congrats, you now know what it's like to be a daily Tor user trying to hit normie sites from exit node IPs xD
  
  7 replies →
- jakub_g 3 months ago
  
  I hate it as much (and the challenge time seems to be getting longer, 10s lately for me, what the hell?)
  But we can all say thank you to all the AI crawlers who hammer websites with impossible traffic.
  
  2 replies →
ocdtrekkie 3 months ago
This is essentially the entire IT excuse for going to anything cloud. I see IT engineers all the time justifying that the downtime stops being their problem and they stop being to blame for it. There's zero personal responsibility in trying to preserve service, because it isn't "their problem" anymore. Anyone who thinks the cloud makes service more reliable is absolutely kidding themselves, because everyone who made the decision to go that way already knows it isn't true, it just won't be their problem to fix it.
If anyone in the industry actually cared about reliability and took personal stake in their system being up, everyone would be back on-prem.
- RajT88 3 months ago
  
  Reliability is not even how the cloud got sold to the C Suite. Good God, when my last company started putting things on Azure back in 2015 stuff would break weekly, usually on Monday mornings.
  No, the value proposition was always about saving money, turning CapEx into OpEx. Direct quote from my former CEO maybe 9 years ago: We are getting out of the business of buying servers.
  Cloud engineering involves architecting for unexpected events: retry patterns, availability zones, multi-region fail over, that sort of thing.
  Now - does it all add up to cost savings? I could not tell you. I have seen some case studies, but I also have been around long enough to take those with a big grain of salt.
  
  12 replies →
- serial_dev 3 months ago
  
  I mean in the end it's about making a trade off that makes sense for your business.
  If the business can live with a couple of hours downtime per year when "cloud" is down, and they think they can ship faster / have less crew / (insert perceived benefit), then I don't know why that is a problem.
tjoff 3 months ago
Users have no options because... everything has been centralized. So it doesn't matter if users care or not.
Users are never a consideration today anyway.
- netdevphoenix 3 months ago
  
  It is a trade-off between convenience and freedom. Netflix vs buying your movies. Spotify vs mp3s. Most tech products have alternatives. But you need to be flexible and adjust your expectations. Most people are not willing to do that
  
  5 replies →
- ajmurmann 3 months ago
  
  There absolutely are options but we aren't using them because nobody cares enough about these downsides. bsky is up, with Mastodon you even have choice between tons of servers and setting up your own. Yet, nobody cares enough about the occasional outage to switch. It's such a minor inconvenience that it won't move the needle one bit. If people actually cared, businesses would lose customers and correct the issue.
- kordlessagain 3 months ago
  
  It’s time to revolt.
  
  3 replies →
alentred 3 months ago
There is an upside too. Us humans, we also need our down time occasionally.
- lxgr 3 months ago
  
  Oh, if only computers could continue working while I take a break, or teams continue working while I’m on PTO…
- eastburnn 3 months ago
  
  Businesses and peoples’ livelihoods are online nowadays, it’s not just scrolling Twitter for fun.
  The internet can’t afford to just “give people mental health breaks.”
  
  17 replies →
- 867-5309 3 months ago
  
  globally coordinated
baxtr 3 months ago
Who cares if a couple of websites are down a day or even two?
As long as HN is up and running, everything is going to be O.K.!
- lobsterthief 3 months ago
  
  There was a problem posting your comment.
- xeromal 3 months ago
  
  So Say We All!
- 52-6F-62 3 months ago
  
  Wealthy, investment-bloated software companies will be fine.
  Smaller companies that provide real world services or goods to make a much more meagre living that rely on some of the services sold to them by said software companies will be impacted much more greatly.
  Losing a day or two of sales to someone who relies on making sales every day can be a growing hardship.
  This doesn’t just impact developers. It’s exactly this kind of myopic thinking that leads to scenarios like mass outages.
  
  5 replies →
thr0w 3 months ago
> But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.
In my direct experience, this isn't true if you're running something even vaguely mission-critical for your customers. Your customer's workers just know that they can't do their job for the day, and your customer's management just knows that the solution they shepherded through their organization is failing.
- acedTrex 3 months ago
  
  It's really quite funny, many of the ACTUALLY vital systems to running the world as we know it are running off of very different softwares. Cloudflare appears to have a much higher % of non vital systems running on it than say something like akamai.
  If akamai went down i have a feeling you'd see a whole lot more real life chaos.
  
  2 replies →
BeFlatXIII 3 months ago
> if “the internet is down” people will put down their device and do something else
In this case, the internet should be down more often.
- jesterp 3 months ago
  
  If the internet is down is what it takes to get you to put it down once in a while, I think thats probably the problem.
falcor84 3 months ago

100% this. While in my professional capacity I'm all in for reliability and redundancy, as an individual, I quite like these situations when it's obvious that I won't be getting any work done and it's out of my control, so I can go run some errands to or read a book, or just finish early.
tjwebbnorfolk 3 months ago

> if “the internet is down” people will put down their device and do something else.
oh no
jclardy 3 months ago

Which "user" are you referring to? Cloudflare users or end product users?
End product users have no power, they can complain to support and maybe get a free month of service, but the 0.1% of customers that do that aren't going to turn the tide and have anything change.
Engineering teams using these services also get "covered" by them - they can finger point and say "everyone else was down too."
lxgr 3 months ago

Many people care, but none of them can (sufficiently) change the underlying incentive structure to effect the necessary changes.
pancsta 3 months ago

> if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.
Which changes nothing to you actually being down, youre only down more. CF proxies always sucked - not your domain, not your domain...
timeon 3 months ago
But Spotify was not down. One social media was down.
This:
> if you’re down, that’s bad. But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.
is just marketing. If you are down with some other websites it is still bad.
- afavour 3 months ago
  
  Admittedly when I wrote that I was thinking about the recent AWS outage. Anecdotally, I asked friends and family about their experience and they assumed the internet was down. Almost everything at my work runs on Google cloud so we were still running but we observed a notable dip in traffic during the outage all the same.
  > it is still bad
  No doubt. But there’s a calculation to make, is it bad enough to spend the extra money on mitigations, to hire extra devops folks to manage it all… and in the majority of end user facing cases the answer is no, it isn’t.
  
  1 reply →
- monooso 3 months ago
  
  > If you are down with some other websites it is still bad.
  In some cases, absolutely. For the vast majority, it really, really doesn't matter.
  (Source: my personal website is down and nobody cares, including me)
LtWorf 3 months ago

> Because no one cares enough, including users.
When have users been asked about anything?
ozgrakkurt 3 months ago

On the other hand, it is cool to be up when the internet is down
delfinom 3 months ago

Eh? It's because they are offering a service too good to refuse.
The internet this day is fucking dangerous and murderous as hell. We need Cloudflare just to keep services up due to the deluge of AI data scrapers and other garbage.
mistrial9 3 months ago

> Because no one cares enough, including users.
this is like a bad motivational speaker talk.. heavy exhortations with a dramatic lack of actual reasoning.
Systems are difficult, people. It is "incentives" of parties and lockin by tech design and vendors, not lack of individual effort.
ge96 3 months ago

Also it's free (the basic domain protection offered by CF anyway)
PunchyHamster 3 months ago

More like "don't have choice". It's not like service provider gonna go to competition, because before you switch, it will be back.
Frankly it's a blessing, always being able to blame the cloud that management forced company to migrate to be "cheaper" (which half of the time turns out to be false anyway)
Hrun0 3 months ago

> It also reduces your incentive to change, if “the internet is down” people will put down their device and do something else. Even if your web site is up they’ll assume it isn’t.
I agree. When people talk about the enshittification of the internet, Cloudflare plays a significant role.
therealdkz 3 months ago

[dead]

martinald 3 months ago

Many reasons but DDoS protection has massive network effects. The more customers you have (and therefore bandwidth provision) the easier it is to hold up against a DDoS, as DDoS are targeting just one (usually) customer.

So there are massive economies of scale. Small CDN with (say) 10,000 customers and 10mbit/sec per customer can handle 100gbit/s DDoS (way too simplistic, but hopefully you get the idea) - way too small.

If you have the same traffic provisioned on average per customer and have 1 million customers, you can handle a DDoS 100x the size.

Only way to compete with this is to massively overprovision bandwidth per customer (which is expensive, as those customers won't pay more just for you to have more redundancy because you are smaller).

In a way (like many things in infrastructure) CDNs are natural monopolies. The bigger you get -> the more bandwidth and PoP you can have -> more attractive to more customers (this repeats over and over).

It was probably very astute of Cloudflare to realise that offering such a generous free plan was a key step in this.

kordlessagain 3 months ago
Your argument is technically flawed.
In a CDN, customers consume bandwidth; they do not contribute it. If Cloudflare adds 1 million free customers, they do not magically acquire 1 million extra pipes to the internet backbone. They acquire 1 million new liabilities that require more infrastructure investment.
All you are doing is echoing their pitch book. Of course they want to skim their share of the pie.
- __alexs 3 months ago
  
  I imagine every single customer is provisioned based on some peak expected typical traffic and that's what they base their capital investment in bandwidth on.
  However most customers are rarely at their peak, this gives you tremendous spare capacity to use to eat DDoS attacks, assuming that the attacks are uncorrelated. This gives you huge amounts of capacity that's frequently doing nothing. Cloudflare advertise this spare capacity as "DDoS protection."
  I suppose in theory it might be possible to massively optimise utilisation of your links, but that would be at the cost of DDoS protection and might not improve your margin very meaningfully, especially is customers care a lot about being online.
- bawolff 3 months ago
  
  > In a CDN, customers consume bandwidth; they do not contribute it
  They contribute money which buys infrastructure.
  > If Cloudflare adds 1 million free customers,
  Is the free tier really customers? Regardless most of them are small that it doesn't cost cloudflare much anyways. The infrastructure is already there anyways. Its worth it to them for the good will it generates which leads to future paying customers. It probably also gives them visibility into what is good vs bad traffic.
  1 million small sites could very well cost less to cloudflare than 1 big site.
- LMYahooTFY 3 months ago
  
  You're missing the economies of scale part.
  OP is saying it's cheaper overall for a 10 million customer company to add infrastructure for 1 million more than it is for a 10,000 customer company to add infrastructure for 1000 more people.
  If you're looking at this as a "share of the pie", it's probably not going to make sense. The industry is not zero sum.
- jiveturkey 3 months ago
  
  You aren't understanding economy of scale, and peak to average ratios.
  The same reason I use cloud compute -- elastic infrastructure because I can't afford the peaks -- is the same reason large service providers "work".
  It's funny how we always focus on Cloudflare, but all cloud providers have this same concentration downside. I think it's because Cloudflare loves to talk out of both sides of their mouth.
  
  2 replies →
codedokode 3 months ago
In my opinion, DDoS is possible only because there is no network protocol for a host to control traffic filtering on upstream providers (deny traffic from certain subnets or countries). In this case everybody would prefer write their own systems rather than rely on a harmful monopoly.
- gnfargbl 3 months ago
  
  The recent Azure DDoS used 500k botnet IPs. These will have been widely distributed across subnets and countries, so your blocking approach would not have been an effective mitigation.
  Identifying and dynamically blocking the 500k offending IPs would certainly be possible technically -- 500k /32s is not a hard filtering problem -- but I seriously question the operational ability of internet providers to perform such granular blocking in real-time against dynamic targets.
  I also have concerns that automated blocking protocols would be widely abused by bad actors who are able to engineer their way into the network at a carrier level (i.e. certain governments).
  
  5 replies →
- peanut-walrus 3 months ago
  
  What traffic would you request the upstream providers to block if getting hit by Aisuru? Considering the botnet consists of residential routers, those are the same networks your users will be originating from. Sure, in best case, if your site is very regional, you can just block all traffic outside your country - but most services don't have this luxury.
  Blocking individual IP addresses? Sure, but consider that before your service detects enough anomalous traffic from one particular IP and is able to send the request to block upstream, your service will already be down from the aggregate traffic. Even a "slow" ddos with <10 packets per second from one source is enough to saturate your 10Gbps link if the attacker has a million machines to originate traffic from.
  
  7 replies →
- powerpixel 3 months ago
  
  > here is no network protocol for a host to control traffic filtering on upstream providers (deny traffic from certain subnets or countries).
  There is no network protocol per se, but there is commercial solutions like fortinet that can block countries iirc, but to note that it's only ip range based so it's not worth a lot
  
  3 replies →
karmelapple 3 months ago

And how many companies want to also be able to build out their own CDN?
Not every company can be an expert at everything.
But perhaps many of us could buy a different CDN than the major players if we want to reduce the likelihood of mass outages like this though.

ulrikrasmussen 3 months ago

Yeah, I went to HN after the third web page didn't work. I am not just worried about the single point of failure, I am much more worried about this centralization eventually shaping the future standards of the web and making it de facto impossible to self-host anything.

Well that and the fact that when 99% goes through a central party, then that central party will be very interesting for authoritarian governments to apply sweeping censorship rules to.

sankalpmukim 3 months ago
It is already nearly impossible/very expensive in my country to be able to get a public IP address (Even IPv6) which you could host on. World is heavily moving towards centrally dependant on these big Cloud providers.
- ghshephard 3 months ago
  
  What part of the world has any ipv6 limitations? In the USA An ISP will give you a /48 from their /32 if you have any colo arrangement without even a blink. That gives you 2^16 networks with essentially infinite number of hosts on each network. Zero additional charge.
popcorncowboy 3 months ago

> eventually shaping the future standards of the web and making it de facto impossible to self-host anything
Eventually?

GuB-42 3 months ago

Another one that worries me is Let's Encrypt.

It is not as bad as Cloudflare or AWS because certificates will not expire the instant there is an outage, but considers that:

- It serves about 2/3 of all websites

- TLS is becoming more and more critical over time. If certificates fail, the web may as well be down

- Certificate lifetimes are becoming shorter and shorter, now 90 days, but Let's Encrypt is now considering 6 days, with 47 days being planned as a minimum

- An outage is one thing, but should a compromise happen, that would be even more catastrophic

Let's Encrypt is a good guy now, but remember that Google used to be a good guy in the 2000s too!

phasmantistes 3 months ago

(Disclaimer: I am tech lead of Let's Encrypt software engineering)
I'm also concerned about LE being a single point of failure for the internet! I really wish there were other free and open CAs out there. Our goal is to encrypt the web, not to perpetuate ourselves.
That said, I'm not sure the line of reasoning here really holds up? There's a big difference between this three-hour outage and the multi-day outage that would be necessary to prevent certificate renewal, even with 6-day certs. And there's an even bigger difference between this sort of network disruption and the kind of compromise that would be necessary to take LE out permanently.
So while yes, I share your fear about the internet-wide impact of total Let's Encrypt collapse, I don't think that these situations are particularly analogous.
seniorThrowaway 3 months ago
Agree, I’ve thought about this one too. The history of SSL/TLS certs is pretty hacky anyway in my opinion. The main problem they are solving really should have been solved at the network layer with ubiquitous IPsec and key distribution via DNS since most users just blindly trust whatever root CAs ship with their browser or OS, and the ecosystem has been full of implementation and operational issues.
Let’s Encrypt is great at making the existing system less painful, and there are a few alternatives like ZeroSSL, but all of this automation is basically a pile of workarounds on top of a fundamentally inappropriate design.
- rtkwe 3 months ago
  
  There's not really a way around the initial trust problem with consumer oriented certs though. Yours could reduce the number of initially trusted down to one I think but not any further.
- kazen44 3 months ago
  
  its a shame DANE never took off. If we actually got around to running a trusted DNSSEC based DNS system and allowed clients to create certificates thanks to DANE, we would be in a far more resilient setup compared to what we are now.
  But DNSSEC was hard according to some, and now we are running a massive SPOF in terms of TLS certificates.
  
  2 replies →
b00ty4breakfast 3 months ago

Google was always a for-profit operation. Let's Encrypt/ISRG could still go rotten but there are less incentives for them to do so as a non-profit.

pixel_popping 3 months ago

Mostly since the AWS craze started a decade ago, developers have gone away from Dedicated servers (which are actually cheaper, go figure), which is causing all this mess.

It's genuinely insane that many companies are designing a great amount of fallbacks... on the software level but almost none is thought on the hardware/infrastructure level, common-sense dictate that you should never host everything on a single provider.

geerlingguy 3 months ago
I tried as hard as I could to stay self hosted (and my backend is, still), but getting constant DDoS attacks and not having the time to deal with fighting them 2-3x a month was what ultimately forced me to Cloudflare. It's still worse than before even with their layers of protection, and now I get to watch my site be down a while, with no ability to switch DNS to point back to my own proxy layer, since CF is down :/
- VladVladikoff 3 months ago
  
  This is wild. Was your website somehow controversial? Ive been running many different websites for over 30+ years now, and have never been the target of a DDOS. The closest I’ve seen was when one website had a blind time based sql injection vulnerability and the attacker was abusing it, all the SLEEP() injected into the database brought the server to a crawl. But that’s just one attacker from a handful of IPs, hardly what i would call a DDOS.
  
  7 replies →
imglorp 3 months ago
With the state of constant attack from AI scrapers and DDOS bots, you pretty much need to have a CDN from someone now, if you have a serious business service. The poor guys with single prem boxes with static HTML can /maybe/ weather some of this storm alone but not everything.
- spurgu 3 months ago
  
  Yeah this is the gist of it. Cloudflare provides an important service that is quite challenging to implement by yourself.
  
  1 reply →
elondaits 3 months ago
I self hosted on one of the company’s servers back in the late 90s. Hard drive crashes (and a hack once, through an Apache bug) had our services (http, pop, smtp, nfs, smb, etc ) down for at least 2-3 days (full reinstall, reconfiguration, etc).
Then, with regular VPSs I also had systems down for 1-2 days. Just last week the company that hosts NextCloud for us was down the whole weekend (from Friday evening) and we couldn’t get their attention until Monday.
So far these huge outages that last 2-5 hours are still lower impact for me, and require me to take less action.
- bungle 3 months ago
  
  Solving issue for a few, and making issues for millions, including perhaps the few. It is easier to sleep at nights though, for a few.
MattSayar 3 months ago

I like the idea of having my own rack in a data center somewhere (or sharing the rack, whatever) but even a tiny cost is still more than free. And even then, that data center will also have outages, with none of the benefits of a Cloudflare Pages, GitHub Pages, etc.
nzach 3 months ago

> developers have gone away from Dedicated servers (which are actually cheaper, go figure)
It depends on how you calculate your cost. If you only include the physical infrastructure having a dedicated server is cheaper. But by having some dedicated server you loose a lot of flexibility. Needs more resources? Just scale up your ec2, and with a dedicated server there is a lot more work involved.
Do you want a 'production-ready' database? With AWS you can just click a few buttons and have a RDS ready to use. To roll out your own PG installation you need someone with a lot of knowledge(how to configure replication? backups? updates? ...).
So if you include salaries in the calculation the result changes a lot. And even if you already have some experts in your payroll by putting them to work in deploying a PG instance you won't be able to use them to build other things that may generate more value to you business than the premium you pay to AWS.
slightwinder 3 months ago

Cloud-Hoster are that hardware-fallback. They started with offering better redundancy and scaling than your homemade breadbox. But it seems they lost something along the way and now we have this.
powerpixel 3 months ago

Maintainance cost is the main issue for on-prem infra, nowadays add things like DDOS protection and/or scraping protection, which can require dedicated team or for your company to rely on some library or open source project that is not guaranteed to be maintained forever (unless you give them support, which i believe in)... Yeah I can understand why companies shift off of on-prem nowadays
PaulHoule 3 months ago

... dedis are cheaper if you are rightsized. If you are wrongsize they just plain crash and you may or may not be able to afford the upgrade.
I was at Softlayer before I was at AWS and what catalyzed the move was the time I needed to add another hard drive to a system and somehow they screwed it up. I couldn't put a trouble ticket it to get it fixed because my database record in their trouble ticket system was corrupted. The next day I moved my stuff to AWS and the day after that they had a top sales guy talk to me to try to get me to stay but it was too late.

lforster 3 months ago

They're using cloudfare for multicloud, but still have cloudfare as a single point of failure. Should make a cloudfare for cloudfare to solve this.

nexttk 3 months ago
Like the infamous "smiling through the pain" meme:
"I added a load-balancer to improve system reliability" (happy)
"Load balancer crashed" (smiling-through-the-pain)
- PunchyHamster 3 months ago
  
  Reliability have very weird curve frankly.
  Technically, multi-node cluster with failover (or full on active-active) will have far higher uptime than just a single node.
  Practically, to get the multi-node cluster (for any non trivial workload) to work right, reliably, fail-over in every case etc. is far more work, far more code (that can have more bugs), and even if you do everything right and test what you can, unexpected stuff can still kill it. Like recently we had uncorrectable memory error which just happened to hit the ceph daemon just right that one of the OSDs misbehaved and bogged down entire cluster...
amalcon 3 months ago

You jest, but this actually does exist. Multiple CDNs sell multi-CDN load balancing (divide traffic between 2+ CDNs per variously-complicated specifications, with failover) as a value add feature, and IIRC there is at least one company for which this is the marquee feature. It's also relatively doable in-house as these things go.
kevin_thibedeau 3 months ago
Failover to Akamai.
- cortesoft 3 months ago
  
  As someone who has worked for a CDN for over a decade, this is what most big customers do. Under normal circumstances, they send portions of traffic to different CDNs, usually based on cost (and or performance in various regions). When an issue happens, they will pull traffic from the problem CDN.
  Of course, if a big incident happens for a big CDN, there might not be enough latent capacity in the other CDNs to take all the traffic. CDNs are a cutthroat business, with small margins, so there usually isn’t a TON of unused capacity laying around.
MichaelZuo 3 months ago

If there’s clearly a single point of failure shouldn’t it be called a single cloud pretending to be “multicloud”?

sotix 3 months ago

This might sound crazy as a software engineer, but I actually like the occasional "snow day" where everything goes down. It's healthy for us to all disconnect from the internet for a bit. The centralization unintentionally helps facilitate that. At least, that's my glass half full perspective.

gspencley 3 months ago
I can understand that sentiment. Just don't lose sight of the impact it can have on every day people. My wife and I own a small theatre and we sell tickets through Eventbrite. It's not my full time job but it is hers. Eventbrite sent out an email this morning letting us know that they are impacted by the outage. Our event page appears to be working but I do wonder if it's impacting ticket sales for this weekend's shows.
So while us in tech might like a "snow day", there are millions of small businesses and people trying to go about their day to day lives who get cut off because of someone else's fuck-ups when this happens.
- telepromptereye 3 months ago
  
  Absolutely solid point; there are a couple of apps I use daily for productivity, chores, even for alarm scheduling, that with the free versions, the ads wouldn’t load so I couldn’t use them but some of them were updated already. Made me realize I forgot that we’re kind of like cyborgs relying on technology that’s integrated so deeply into our lives that all it takes is an EMP blast like a monopolistic service going down to bring -us- down until we take a breath and learn how to walk again. Wild time.
cultofmetatron 3 months ago

> This might sound crazy as a software engineer, but I actually like the occasional "snow day" where everything goes down
As as software engineer, I get it. as a CTO, I spent this morning triaging with my devops ai(actual Indian) to find some workaround (we found one) while our CEO was doing damage control with customers (non technical field) who were angry that we were down and they were losing business by the minute.
sometimes I miss not having a direct stake in the success of the business.
hashim 3 months ago
I'm guessing you're employed and your salary is guaranteed regardless. Would you have the same outlook if you were the self-employed founder of an online business and every minute of outage was costing you money?
- nondrool 3 months ago
  
  What are you paying in order to be down?
  Even if you were making a million a minute, typically, it still didn't cost you a thing, nor have you lost anything.
  You're not making as much, sure, but neither a cost, nor a loss.
  
  2 replies →
- gosub100 3 months ago
  
  your house isn't going into foreclosure because your shop went down for a day.
  
  5 replies →
ljm 3 months ago

If the internet was just social media, SaaS productivity suites, and AI slop, sure...
But there are systems that depend on Cloudflare, directly or not, and when they go down it can have a serious impact on somebody's livelihood.

majani 3 months ago

Now that network effects and data lock-in have taken root, downtime is not as big of a concern as it was in the 2000s

amw-zero 3 months ago
What does this even mean? Because people have locked in their data, they’re ok with downtime? I can’t imagine a world where this is true.
- Ambolia 3 months ago
  
  It costs a lot of money to move, you don't know if the alternative will be any better, and if it affects a lot of companies then it's nobody's fault. "Nobody ever got fired for buying Cloudflare/AWS" as they say.
- wahnfrieden 3 months ago
  
  It's just that customers are more understanding when they see their Netflix not working either otherwise they just think you're less professional. Try talking to customers after an outage and you will see.
- serf 3 months ago
  
  it's not just that, it's the creation of a sorta status symbol, or at least of symbol of normality.
  there was a point (maybe still) where not having a netflix subscription was seen as 'strange'.
  if that's the case in your social circles -- and these kind of social things bother you -- you're not going to cancel the subscription due to bad service until it becomes a socially accepted norm.
swyx 3 months ago
except, yknow, where peoples lives and livelihoods depend on access to information/being able to do things on exact time. aws and cloudflare are disqualifying themselves from hospitals and military and whatnot.
- kordlessagain 3 months ago
  
  For example, Cloudflare employees make money on promises to mitigate such attacks, but then can’t guarantee they will, and take all their customers down at once. It’s a shared pain model.

mobiuscog 3 months ago

How did we get to a place where Cloudflare being down means we see an outage page, but on that page it tells us explicitly that the host we're trying to connect to is up, and it's just a Cloudflare problem.

If it can tell us that the host is up, surely it can just bypass itself to route traffic.

ralferoo 3 months ago

"... surely it can just ..."
Congratulations, you've successfully completed Management Training 101.

ec109685 3 months ago

Totally cooked if you have Cloudflare fronting us-east-1, with no redundancies.

lbreakjai 3 months ago

It could be worse. You could have a backup on Azure.
tacker2000 3 months ago

The mother of all bad infra decisions.
a012 3 months ago

They have multi cloud infra, between us-east-1 and Azure

kaonwarb 3 months ago

I recommend this Ben Thompson piece on why resiliency has declined: https://stratechery.com/2025/resiliency-and-scale/

giantrobot 3 months ago

People use CloudFlare because it's a "free" way for most sites to not get exploited (WAF) or DDoSed (CDN/proxy) regularly. A DDoS can cost quite a bit more than a day of downtime, even just a thundering herd of legitimate users can explode an egress bill.

It sucks there's not more competition in this space but CloudFlare isn't widely used for no reason.

AWS also solves real problems people have. Maintaining infrastructure is expensive as is hardware service and maintenance. Redundancy is even harder and more expensive. You can run a fairly inexpensive and performant system on AWS for years for the cost of a single co-located server.

seydor 3 months ago

Slowly and with full conscience of where we were heading to.

neop1x 3 months ago

It's not only centralization in the sense your website will be down if they are down but it is also a centralized MITM proxy. If you transfer sensitive data like chats over cloudflare-"protected" endpoints, you also allow CF to transparently read and analyze it in plain-text. It must be very easy for state agencies to spy on the internet nowadays, they woukd just ask CF to redirect traffic to them.

alkonaut 3 months ago

Because it's better to have a really convenient and cheap service that works 99% of the time, than a resilient that is more expensive or more cumbersome to use.

It's like github vs whatever else you can do with git that is truly decentralized. The centralization has such massive benefits that I'm very happy to pay the price of "when it's down I can't work".

kahrl 3 months ago

When there is an accident on the interstate we should blame the centralization of traffic and advocate for no more highways.

Very worrying indeed.

rglover 3 months ago

Most developers don't care to know how the underlying infrastructure works (or why) and so they take whatever the public consensus is re: infra as a statement of fact (for the better part of the last 15 years or so that was "just use the cloud"). A shocking amount of technical decisions are socially, not technically enforced.

bilekas 3 months ago

This topic is raised every time there is an outage with cloudflare and the truth of the matter is, they offer an incredible service, there is not a bit enough competition to deal with it. By definition their services are so good BECAUSE their adoption rate is so high.

It's very frustrating of course, and it's the nature of the beast.

blazinglyfast 3 months ago
False dichotomy. Both can be true.
- bilekas 3 months ago
  
  > False dichotomy.
  Not sure I follow, I didn't say it wasn't worrying or an issue. Just the reasons for it getting to this point are valid.

bikamonki 3 months ago

Compliance. If you wanna sell your SAAS to big corpo, their compliance teams will feel you know what you're doing if they read AWS or Cloudflare on your architecture, even if you do not quite know what you're doing.

phendrenad2 3 months ago

Because DDoS is a fact of life (and even if you aren't targeted by DDoS, the bot traffic probing you to see if you can be made part of the botnet is enough to take down a cheap $5 VPS). So we have to ask - why? Personally, I don't accept the hand-wavy explanation that botnets are "just a bunch of hacked IoT devices". No, your smart lightbulb isn't taking down Reddit. I slightly believe the secondary explanation that it's a bunch of hacked home routers. We know that home routers are full of things like suspicious oopsie definitely-not-government backdoors.

drob518 3 months ago

IMO, centralization is inevitable because the fundamental forces drive things in that direction. Clouds are useful for a variety of reasons (technical, time to market, economic), so developers want to use them. But clouds are expensive to build and operate, so there are only a few organizations with the budget and competency to do it well. So, as the market matures you end up with 3 to 5 major cloud operators per region, with another handful of smaller specialists. And that’s just the way it works. Fighting against that is to completely swim upstream with every market force in opposition.

gist 3 months ago

There is this tendency to phrase questions (or statements) as "when did 'we' ".

These decision are made individually not centrally. There is no process in place (and most likely there will never be) that will be able to control and dictate if people decide one way of doing things is the best way to do it. Even assuming they understand everything or know of the pitfalls.

Even if you can control individually what you do for the site you operate (or are involved in) you won't have any control on parts of your site (or business) that you rely on where others use AWS or Cloudflare.

ljm 3 months ago

I would be less worried if Cloudflare and AWS weren't involved in many more things than simply running DNS.

AWS - someone touches DynamoDB and it kills the DNS.

Cloudflare - someone touches functionality completely unrelated to DNS hosting and proxying and, naturally, it kills the DNS.

There is this critical infrastructure that just becomes one small part of a wider product offering, worked on by many hands, and this critical infrastructure gets taken down by what is essentially a side-effect.

It's a strong argument to move to providers that just do one thing and do it well.

exasperaited 3 months ago

Re: Cloudflare it is because developers actively pushed "just use Cloudflare" again and again and again.

It has been dead to me since the SSL cache vulnerability thing and the arrogance with which senior people expected others to solve their problems.

But consider how many people still do stupid things like use the default CDN offered by some third party library, or use google fonts directly; people are lazy and don't care.

abtinf 3 months ago

Because they are great services, are generally pretty easy to get started with, and usually work as expected, which has led to broad adoption.

telepromptereye 3 months ago

We take the idea of the internet always being on for granted. Most people don’t understand the stack and assume that when sites go down it’s isolated, and although I agree with you, it’s just as much complacency and lack of oversight and enforcement delays in bureaucracy as it is centralization. But I guess that’s kind of the umbrella to those things… lol

an-allen 3 months ago

Well the centralisation without rapid recovery and practices that provide substantial resiliency… that would be worrying.

But I dare say the folks at these organisations take these matters incredibly seriously and the centralisation problem is largely one of risk efficiency.

I think there is no excuse, however, to not have multi region on state, and pilot light architectures just in case.

cj 3 months ago

Except businesses love it.

A lot (and I mean a lot) of people in IT like centralization specifically because it’s hard to blame people for doing something that everyone else is doing.

iso1631 3 months ago

And HN users love it too. I've had people on this site say how great it is that their system routes 30% of traffic on the internet.
I'd be horrified. That's not the internet or computing industries I grew up with, or started working in.
But as long as the SPY keeps hitting > 10% returns each year, everyone's happy.
chb 3 months ago
"No one gets fired for buying IBM!"
- deathhand 3 months ago
  
  "No one gets fired for buying Microsoft" "No one gets fired for buying AWS" "No one gets fired for buying Cloudflare"
  Perhaps the most graceful death of a tech company is that sentiment? Before some perception shift?

mvkel 3 months ago

This was always the case. There was always a "us-east" in some capacity, under Equinix, etc. Except it used to be the only "zone," which is why the internet is still so brittle despite having multiple zones. People need to build out support for different zones. Old habits die hard, I guess.

bsoles 3 months ago

> How did we get to a place where either Cloudflare or AWS having an outage means a large part of the web going down?

As always, in the name of "security". When are we going to learn that anything done, either by the government or by a corporation, in the name of security is always bad for the average person?

butlike 3 months ago

It's weird to think about so bear with me. I don't mean this sardonically or misanthropically. But, it's "just the internet." It's just the internet. It dones't REALLY matter in a large enough macro view. It's JUST the internet.

rcarmo 3 months ago

What is worrying is that distributed systems don’t seem to be that distributed in practice.

expedition32 3 months ago
Designed to survive a first strike from the USSR. Taken down by Cloudflare.
- kazen44 3 months ago
  
  oh mind you, the actual routing architecture of the internet was. Cloudflare being down mainly just affects layer 7...

hhthrowaway1230 3 months ago

Don't think there is anything wrong with a centralised service being down, you just make a conscious decision if you want that and can afford that?

People not being ready for cloudflare/[insert hyperscaler] to be possibly down is the only fault.

Lammy 3 months ago

It's because single points of traffic concentration are the most surveillable architecture, so FVEY et al economically reward with one hand those companies who would build the architecture they want to surveil with the other hand.

kilpikaarna 3 months ago

Currently at the public library and I can't use the customer inventory terminals to search for books. They're just a web browser interface to the public facing website, and it's hosted behind CF. Bananas.

gmiller123456 3 months ago

Don't forget the CloudStrike outage: One company had a bug that brought down almost everything. Who would have thought there are so many single points of failure across the entire Internet.

poemxo 3 months ago

For most services it's safer to host from behind Cloudflare, and Cloudflare is considered more highly available than a single IaaS or PaaS, at least in my headcanon.

bawolff 3 months ago

The same reason we have centralization across the economy. Economies of scale is how you make a big business succesful, and once you are on top its hard to dislodge you.

chasing0entropy 3 months ago

Agreed. More worrying is that it appears standard practice or separation between domain and nameserver administration has been lost to one-stop-shop marketing.

strict9 3 months ago

And all of these outages happening not long after most of them dismissed a large amount of experienced staff while moving jobs offshore to save in labor costs.

ridgeguy 3 months ago

Short-term economic forces, probably. Centralization is often cheaper in the near term. The cost of designing in single-point failure modes gets paid later.

kordlessagain 3 months ago

The technical term for it is a man in the middle. It’s better to call it what it is that way you aren’t fooled into thinking it’s not, because it is.

paulddraper 3 months ago

Because bots are a real thing.

And it’s hard to protect against DDoS without something like Cloudflare.

Look at the posts here.

Even the meager HN “hug of death” will take things down

peacebeard 3 months ago

A lot of products use AWS because "we could build redundancy and multi-region if we need it" and then never build it.

rtkwe 3 months ago

I think some of the issues in the last outage actually affected multiple regions. IIRC internally some critical infrastructure for AWS depends on us-east-1 or at least it failed in a way that didn't allow failover.

BurningFrog 3 months ago

How many more of these until governments step in and take over "critical infrastructure"?

nntwozz 3 months ago

Two ways. Gradually, then suddenly.

ronald_petty 3 months ago

Consider joining the Internet Society. An entire group of people who care!

burnt-resistor 3 months ago

A key risk of monopolies is that they lead to monoculture SPoFs.

glitchc 3 months ago

All decentralized systems tend to centralization over time.

ekianjo 3 months ago

because cloudfare protection blah blah, until cloudfare is down itself and then you are back to "who watches the watchmen"

k12sosse 3 months ago

That's easy, the watchmen watchmen watch the watchmen.

baq 3 months ago

because efficiency trumps redundancy in the short term, which is all that matters in a super competitive environment.

joeiq 3 months ago

Is avoiding single point of failure in anyone’s playbook? ¯\_(ツ)_/¯

whstl 3 months ago

We only care about it when it's time to complain about the work of individual people.
Companies can always do as they please and people will rationalize anything.

moralestapia 3 months ago

5 mins. of thought to figure out why these services exist?

Dialogue about mitigations/solutions? Alternative services? High availability strategies?

Nah! It's free to complain.

Me personally, I'd say those companies do a phenomenal job by being a de facto backbone of the modern web. Also Cloudflare, in particular, gives me a lot of things for free.

fithisux 3 months ago

Hacking software or hardware is so old school.

The target these days is the user.

The make-believe worm.

thenthenthen 3 months ago

…sneaks in Azure

0xbadcafebee 3 months ago

It's not really. People are just very bad at putting the things around them into perspective.

Your power is provided by a power utility company. They usually serve an entire state, if not more than one (there are smaller ones too). That's "centralization" in that it's one company, and if they "go down", so do a lot of businesses. But actually it's not "centralized", in that 1) there are actually many different companies across the country/world, and 2) each company "decentralizes" most of its infrastructure to prevent massive outages.

And yes, power utilities have outages. But usually they are limited in scope and short-lived. They're so limited that most people don't notice when they happen, unless it's a giant weather system. Then if it's a (rare) large enough impact, people will say "we need to reform the power grid!". But later when they've calmed down, they realize that would be difficult to do without making things worse, and this event isn't common.

Large internet service providers like AWS, Cloudflare, etc, are basically internet utilities. Yes they are large, like power utilities. Yes they have outages, like power utilities. But the fact that a lot of the country uses them, isn't any worse than a lot of the country using a particular power company. And unlike the power companies, we're not really that dependent on internet service providers. You can't really change your power company; you can change an internet service provider.

Power didn't used to be as reliable as it is. Everything we have is incredibly new and modern. And as time has passed, we have learned how to deal with failures. Safety and reliability has increased throughout critical industries as we have learned to adapt to failures. But that doesn't mean there won't be failures, or that we can avoid them all.

We also have the freedom to architect our technology to work around outages. All the outages you have heard about recently could be worked around, if the people who built on them had tried:

- CDN goes down? Most people don't absolutely need a CDN. Point your DNS at your origins until the CDN comes back. (And obviously, your DNS provider shouldn't be the same as your CDN...)

- The control plane goes down on dynamic cloud APIs? Enable a "limp mode" that persists existing infrastructure to serve your core needs. You should be able to service most (if not all) of your business needs without constantly calling a control plane.

- An AZ or region goes down? Use your disaster recovery plan: deploy infrastructure-as-code into another region or AZ. Destroy it when the az/region comes back.

...and all of that just to avoid a few hours of downtime per year? It's likely cheaper to just take the downtime. But that doesn't stop people from piling on when things go wrong, questioning whether the existence of a utility is a good idea.

cyanydeez 3 months ago

CAPITALISM

Are people really this confused?

TacticalCoder 3 months ago

[dead]