- SSL/TLS: You will likely lose your Cloudflare-provided SSL certificate. Your site will only work if your origin server has its own valid certificate.
- Security & Performance: You will lose the performance benefits (caching, minification, global edge network) and security protections (DDoS mitigation, WAF) that Cloudflare provides.
- This will also reveal your backend internal IP addresses. Anyone can find permanent logs of public IP addresses used by even obscure domain names, so potential adversaries don't necessarily have to be paying attention at the exact right time to find it.
If anyone needs the internet to work again (or to get into your cf dashboard to generate API keys), if you have Cloudflare WARP installed, turning it on appears to fix otherwise broken sites. Maybe using 1.1.1.1 does too, but flipping the radio box was faster. Some parts of sites are still down, even after tunneling into to CF.
A colleague of mine just came bursting through my office door in a panic, thinking he brought our site down since this happened just as he made some changes to our Cloudflare config. He was pretty relieved to see this post.
You joke and I think its funny, but as a junior engineer I would be quite proud if some small change I made was able to take down the mighty Cloudflare.
It's also what was the cause of the Azure Front Doors global outage two weeks ago - https://aka.ms/air/YKYN-BWZ
"A specific sequence of customer configuration changes, performed across two different control plane build versions, resulted in incompatible customer configuration metadata being generated. These customer configuration changes themselves were valid and non-malicious – however they produced metadata that, when deployed to edge site servers, exposed a latent bug in the data plane. This incompatibility triggered a crash during asynchronous processing within the data plane service. This defect escaped detection due to a gap in our pre-production validation, since not all features are validated across different control plane build versions."
> May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances.
I'd love to know more about what those specific circumstances were!
I'm pretty sure I crashed Gmail using something weird in its filters. It was a few years ago. Every time I did something specific (I don't remember what), it would freeze and then display a 502 error for a while.
What’s funny is as I get older this feeling of relief turns more like a feeling of dread. The nice thing about problems that you cause is that you have considerable autonomy to fix them. Cloudflare goes down you’re sitting and waiting for a 3 party to fix something.
The problem is, I still get the wrong end of the stick when AWS or CF go down! Management doesn't care, understandably. They just want the money to keep coming in. It's hard to convince them that this is a pretty big problem. The only thing that will calm them down a bit is to tell them Twitter is also down. If that doesn't get them, I say ChatGPT is also down. Now NOBODY will get any work done! lol.
When I'm debugging something, I'm not usually looking for the solution to the problem; I'm looking for sufficient evidence that I didn't cause the problem. Once I have that, the velocity at which I work slows down
Maybe this isn’t great, but I get a hint of that feeling when I’m on an airplane and hear a baby crying. For a number of years, if I heard a baby crying, it was probably my baby and I had to deal with it. But now my kids are past that phase, so when I hear the crying, after that initial jolt of panic I realize that it isn’t my problem, and that does give me the warm fuzzies. Even though I do feel bad for the baby and their parents.
I woke up getting bombarded by multiple clients messages of sites not working, I shitted my pants because I've changed the config just yesterday. When I saw the status message "cloudflare down" I was so relieved.
Good that he worked it out so quick. I recently spent a day debugging email problems on Railway PaaS, because they silently closed an SMTP port without telling anyone.
You missed a great opportunity to dead-pan him with something like "No, Bob, not just our site, you brought down the entire Internet, look at this post!"
> In short, a latent bug in a service underpinning our bot mitigation capability started to crash after a routine configuration change we made. That cascaded into a broad degradation to our network and other services. This was not an attack.
It still astounds me that the big dogs still do not phase config rollouts. Code is data, configs are data, they are one and the same. It was the same issue with the giant crowdstrike outage last year, they were rawdogging configs globally and a bad config made it out there and everything went kaboom.
You NEED to phase config rollouts like you phase code rollouts.
The big dogs absolutely do phase config rollouts as a general rule.
There are still two weaknesses:
1) Some configs are inherently global and cannot be phased. There's only one place to set them. E.g. if you run a webapp, this would be configs for the load balancer as opposed to configs for each webserver
2) Some configs have a cascading effect -- even though a config is applied to 1% of servers, it affects the other servers they interact with, and a bad thing spreads across the entire network
I think it's uncharitable to jump to the conclusion that just because there was a config-based outage they don't do phased config rollouts. And even more uncharitable to compare them to crowdstrike.
In a company I am no longer with I argued much the same when we rolled out "global CI/CD" on IAC. You made one change, committed and pushed, wham it's on 40+ server clusters globally. I hated it. The principal was enamored with it, "cattle not pets" and all that, but the result was things slowed down considerably because anyone working with it became so terrified of making big changes.
Because adversaries adapt quickly, they have a system that deploys their counter-adversary bits quickly without phasing - no matter whether they call them code or configs. See also: Crowdstrike.
Configuration changes are dangerous for CF it seems, and knocked down $NET almost 4% today. I wonder what the industry wide impact is for each of these outages?
>Configuration changes are dangerous for CF it seems, and knocked down $NET almost 4% today. I wonder what the industry wide impact is for each of these outages?
This is becoming the "new normal." It seems like every few months, there's another "outage" that takes down vast swathes of internet properties, since they're all dependent on a few platforms and those platforms are, clearly, poorly run.
This isn't rocket surgery here. Strong change management, QA processes and active business continuity planning/infrastructure would likely have caught this (or not), as is clear from other large platforms that we don't even think about because outages are so rare.
Like airline reservations systems[0], credit card authorization systems from VISA/MasterCard, American Express, etc.
Those systems (and others) have outages in the "once a decade" or even much, much, longer ranges. Are the folks over at SABRE and American Express that much smarter and better than Cloudflare/AWS/Google Cloud/etc.? No. Not even close. What they are is careful as they know their business is dependent on making sure their customers can use their services anytime/anywhere, without issue.
It amazes me the level of "Stockholm Syndrome"[1] expressed by many posting to this thread, expressing relief that it wasn't "an attack" and essentially blaming themselves for not having the right tools (API keys, etc.) to recover from the gross incompetence of, this time at least, Cloudflare.
I don't doubt that I'll get lots of push back from folks claiming, "it's hard to do things at scale," and/or "there are way too many moving parts," and the like.
Other organizations like the ones I mention above don't screw they're customers every 4-6 months with (clearly) insufficiently tested configuration and infrastructure changes.
Yet many here seem to think that's fine, even though such outages are often crushing to their businesses. But if the customers of these huge providers don't demand better, they'll only get worse. And that's not (at least in my experience) a very deep or profound idea.
Pretty much everything is down (checking from the Netherlands). The Cloudflare dashboard itself is experiencing an outage as well.
Not-so-funny thing is that the Betterstack dashboard is down but our status page hosted by Betterstack is up, and we can't access the dashboard to create an incident and let our customers know what's going on.
Yep that's also my experience. Except HN because it does not use *** Cloudflare because it knows it is not necessary. I just wrote a blog titled "Do Not Put Your Site Behind Cloudflare if You Don't Need To" [1].
Yes, I never understand this obsession for centralized services like Cloudflare. To be fair though, if our tiny blogs anyway had a hundred or so visitors monthly, does it matter if it had an outage for a day?
1. DDOS protection is not the only thing anymore, I use cloudflare because of vast amounts of AI bots from thousands of ASNs around the world crawling my CI servers (bloated Java VMs on very undersized hosts) and bringing them down (granted, I threw cloudflare onto my static sites as well which was not really necessary, I just liked their analytics UX)
2. the XKCD comic is mis-interpreted there, that little block is small because it's a "small open source project run by one person", cloudflare is the opposite of that
3. edit: also cloudflare is awesome if you are migrating hosts, did a migration this past month, you point cloudflare to the new servers and it's instant DNS propagation (since you didnt propagate anything :) )
It’s that time of the year again where we all realize that relying on AWS and Cloudflare to this degree is pretty dangerous but then again it’s difficult to switch at this point.
If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.
Unless you’re say at airport trying to file a luggage claim … or at the pharmacy trying to get your prescription. I think as a community we have a responsibility to do better than this.
> If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.
Which only shows that chasing five 9s is worthless for almost all web products. The idea is that by relying on AWS or Cloudflare you can push your uptime numbers up to that standard, but these companies themselves are having such frequent outages that customers themselves don't expect that kind reliability from web products.
If I choose AWS/cloudflare and we're down with half of the internet, then I don't even need to explain it to my boss' bosses, because there will be an article in the mainstream media.
If I choose something else, we're down, and our competitors aren't, then my overlords will start asking a lot of questions.
Happy to hear anyone's suggestions about where else to go or what else to do in regards to protecting from large-scale volumetric DDoS attacks. Pretty much every CDN provider nowadays has stacked up enough capacity to tank these kind of attacks, good luck trying to combat these yourself these days?
Not saying not to do this to get through, but just as an observation, it’s also the sort of thing that can make these issues a nightmare to remediate, since the outage can actually draw more traffic just as things are warming up, from customers desperate to get through.
I'm already logged in on the cloudflare dashboard and trying to disable the CF proxy, but getting "404 | Either this page does not exist, or you do not have permission to access it" when trying to access the DNS configuration page.
Maybe that's precisely what Cloudflare did and now their status page is down because it's receiving an unusual amount of traffic that the VPS can't handle.
Could always just use a status page that updates itself. For my side project Total Real Returns [1], if you scroll down and look at the page footer, I have a live status/uptime widget [2] (just an <img> tag, no JS) which links to an externally-hosted status page [3]. Obviously not critical for a side project, but kind of neat, and was fun to build. :)
This is unrelated to the cloudflare incident but thanks a lot for making that page. I keep checking it from time to time and it's basically the main data source for my long term investing.
1- Does GCP also have any outages recently similar to AWS, Azure or CF? If a similar size (14 TB?) DDoS were to hit GCP, would it stand or would it fail?
2- If this DDoS was targeting Fly.io, would it stand? :)
Seems like workers are less affected and maybe betterstack has decided to bypass cloudflare "stuff" for the status pages? (maybe to cut down costs). My site is still up though some GitHub runners did show it failed at certain points.
When its back up, do yourself a favour and rent a $5/mo vps in another country from a provider like OVH or Hetzner and stick your status page on that.
"Yes but what if they go down" - it doesnt matter, having it hosted by someone who can be down for the same reason as your main product/service is a recipe for disaster.
Definitely. Tangentially, I encountered 504 Gateway Timeout errors on cloudflarestatus.com about an hour ago. The error page also disclosed the fact that it's powered by CloudFront (Amazon's CDN).
I don't get why you need such a service for a status page with 99.whatever% uptime. I mean, your status page only has to be up if everything else is down, so maybe 1% uptime is fine.
There's something maliciously satisfying about seeing your own self-hosted stuff working while things behind Cloudflare or AWS are broken. Sure, they have like four more nines that me, but right now I'm sitting pretty.
My (s)crappy personal site was up during the AWS outage, Azure outage and now Cloud flare outage. And I have it for 2 months only! Maybe I can add a tracker somewhere, might be fun.
How do you deal with DNS? I'm hosting something on a Raspberry Pi at home, and I had recently moved the DNS to Cloudflare. It's quite funny seeing my small personal website being down, although quite satisfying seeing both the browser and host with a green tick while Cloudflare is down.
DNS is actually one of the easiest services to self-host, and it's fairly tolerant of downtime due to caching. If you want redundancy/geographical distribution, Hurricane Electric has a free secondary/slave DNS service [0] where they'll automatically mirror your primary/master DNS server.
I don't have experience with a dynDNS setup like you describe, hosting from (probably) home. But my domains are on a VPS (and a few other places here and there) and DNS is done via my domain reseller's DNS settings pages.
Never had an issue hosting my stuff, but as said - don't yet have experience hoting something from home with a more dynamic DNS setup.
This is a real problem for some some “old-school enterprise” companies that use Oracle, SAP, etc. along with the new AWS/CF based services. They are all waiting around for new apps to come back up while their Oracle suite/SAP are still functioning. There is a lesson here for some of these new companies selling to old-school companies.
I was just able to save a proxied site. Then the dashboard went down again. I didn't even know it was still on. It's really not doing anything for performance because the traffic is quite low.
Is it me or has there been a very noticeable uptick in large scale infra-level outages lately? AWS, Cloudflare, etc have all been way under whatever SLA they publish.
That does seem to be a coincidence, as the recent outages making headlines (including this one according to early reports) have been associated with huge traffic spikes. It seems DDoS are reaching a new level.
For me the only silver lining to all these cloud outages is now we know that their published SLA times mean absolutely nothing. The number of 9's used to at least give an indication of intent of reliability, now they are twisted to whatever metric the company wants to represent and dont actually represent guaranteed uptime anywhere.
Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.
None of the recent major outages were traced down to "vibe coding" or anything of the sort. They appear to be the kind of misconfigurations and networking fuckups that existed since Internet became more complex than 3 routers.
> Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.
Likely this coupled with the mass brain damage caused by never-ending COVID re-infections.
Since vaccines don't prevent transmission, and each re-infection increases the chances of long COVID complications, the only real protection right now is wearing a proper respirator everywhere you go, and basically nobody is doing that anymore.
The theory I’ve heard is holiday deploy freezes coupled with Q4 goals creates pressure to get things in quickly and early. It’s all been in the last month or so which does line up.
This only amplifies the often-repeated propaganda about the "very powerful" enemies of democracy, who in fact are very fragile dictatorships. There's enough incompetence at tech companies to f up their own stuff.
Somewhere, at a floating desk behind a wall of lava lamps, in a nyancatified ghostty terminal with 32 different shader plugins installed:
You're absolutely right! I shouldn't have force pushed that change to master. Let me try and roll it back. * Confrobulating* Oh no! Cloudflare appears to be down and I cannot revert the change. Why don't you go make a cup of coffee until that comes back. This code is production ready, it's probably just a blip.
If it's any guidance, US cyber risk insurance (which covers among other things disruptions due to supplier outages) has continuously dropped in price since Q1 2023, with a handful of percent per year.
Even many non tech people have begun to associate Internet wide outages with “aws must be down” so I imagine many of them searching “is aws down” and for down detector, a hit is a down report, so it will report aws impacts even when the culprit is cloudflare in this case
How did we get to a place where either Cloudflare or AWS having an outage means a large part of the web going down? This centralization is very worrying.
Oddly this centralization allows a complete deferral of blame without you even doing anything: if you’re down, that’s bad. But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.
It also reduces your incentive to change, if “the internet is down” people will put down their device and do something else. Even if your web site is up they’ll assume it isn’t.
I’m not saying this is a good thing but I’m simply being realistic about why we ended up where we are.
As a user I do care, because I waste so much time on Cloudflare's "prove you are human" blocking-page (why do I have to prove it over and over again?), and frequently run on websites blocking me entirely based on some bad IP-blacklist used along with Cloudflare.
This is essentially the entire IT excuse for going to anything cloud. I see IT engineers all the time justifying that the downtime stops being their problem and they stop being to blame for it. There's zero personal responsibility in trying to preserve service, because it isn't "their problem" anymore. Anyone who thinks the cloud makes service more reliable is absolutely kidding themselves, because everyone who made the decision to go that way already knows it isn't true, it just won't be their problem to fix it.
If anyone in the industry actually cared about reliability and took personal stake in their system being up, everyone would be back on-prem.
> But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.
In my direct experience, this isn't true if you're running something even vaguely mission-critical for your customers. Your customer's workers just know that they can't do their job for the day, and your customer's management just knows that the solution they shepherded through their organization is failing.
100% this. While in my professional capacity I'm all in for reliability and redundancy, as an individual, I quite like these situations when it's obvious that I won't be getting any work done and it's out of my control, so I can go run some errands to or read a book, or just finish early.
Which "user" are you referring to? Cloudflare users or end product users?
End product users have no power, they can complain to support and maybe get a free month of service, but the 0.1% of customers that do that aren't going to turn the tide and have anything change.
Engineering teams using these services also get "covered" by them - they can finger point and say "everyone else was down too."
Eh? It's because they are offering a service too good to refuse.
The internet this day is fucking dangerous and murderous as hell. We need Cloudflare just to keep services up due to the deluge of AI data scrapers and other garbage.
More like "don't have choice". It's not like service provider gonna go to competition, because before you switch, it will be back.
Frankly it's a blessing, always being able to blame the cloud that management forced company to migrate to be "cheaper" (which half of the time turns out to be false anyway)
> It also reduces your incentive to change, if “the internet is down” people will put down their device and do something else. Even if your web site is up they’ll assume it isn’t.
I agree. When people talk about the enshittification of the internet, Cloudflare plays a significant role.
Many reasons but DDoS protection has massive network effects. The more customers you have (and therefore bandwidth provision) the easier it is to hold up against a DDoS, as DDoS are targeting just one (usually) customer.
So there are massive economies of scale. Small CDN with (say) 10,000 customers and 10mbit/sec per customer can handle 100gbit/s DDoS (way too simplistic, but hopefully you get the idea) - way too small.
If you have the same traffic provisioned on average per customer and have 1 million customers, you can handle a DDoS 100x the size.
Only way to compete with this is to massively overprovision bandwidth per customer (which is expensive, as those customers won't pay more just for you to have more redundancy because you are smaller).
In a way (like many things in infrastructure) CDNs are natural monopolies. The bigger you get -> the more bandwidth and PoP you can have -> more attractive to more customers (this repeats over and over).
It was probably very astute of Cloudflare to realise that offering such a generous free plan was a key step in this.
In a CDN, customers consume bandwidth; they do not contribute it. If Cloudflare adds 1 million free customers, they do not magically acquire 1 million extra pipes to the internet backbone. They acquire 1 million new liabilities that require more infrastructure investment.
All you are doing is echoing their pitch book. Of course they want to skim their share of the pie.
In my opinion, DDoS is possible only because there is no network protocol for a host to control traffic filtering on upstream providers (deny traffic from certain subnets or countries). In this case everybody would prefer write their own systems rather than rely on a harmful monopoly.
Yeah, I went to HN after the third web page didn't work. I am not just worried about the single point of failure, I am much more worried about this centralization eventually shaping the future standards of the web and making it de facto impossible to self-host anything.
Well that and the fact that when 99% goes through a central party, then that central party will be very interesting for authoritarian governments to apply sweeping censorship rules to.
It is already nearly impossible/very expensive in my country to be able to get a public IP address (Even IPv6) which you could host on. World is heavily moving towards centrally dependant on these big Cloud providers.
It is not as bad as Cloudflare or AWS because certificates will not expire the instant there is an outage, but considers that:
- It serves about 2/3 of all websites
- TLS is becoming more and more critical over time. If certificates fail, the web may as well be down
- Certificate lifetimes are becoming shorter and shorter, now 90 days, but Let's Encrypt is now considering 6 days, with 47 days being planned as a minimum
- An outage is one thing, but should a compromise happen, that would be even more catastrophic
Let's Encrypt is a good guy now, but remember that Google used to be a good guy in the 2000s too!
(Disclaimer: I am tech lead of Let's Encrypt software engineering)
I'm also concerned about LE being a single point of failure for the internet! I really wish there were other free and open CAs out there. Our goal is to encrypt the web, not to perpetuate ourselves.
That said, I'm not sure the line of reasoning here really holds up? There's a big difference between this three-hour outage and the multi-day outage that would be necessary to prevent certificate renewal, even with 6-day certs. And there's an even bigger difference between this sort of network disruption and the kind of compromise that would be necessary to take LE out permanently.
So while yes, I share your fear about the internet-wide impact of total Let's Encrypt collapse, I don't think that these situations are particularly analogous.
Agree, I’ve thought about this one too. The history of SSL/TLS certs is pretty hacky anyway in my opinion. The main problem they are solving really should have been solved at the network layer with ubiquitous IPsec and key distribution via DNS since most users just blindly trust whatever root CAs ship with their browser or OS, and the ecosystem has been full of implementation and operational issues.
Let’s Encrypt is great at making the existing system less painful, and there are a few alternatives like ZeroSSL, but all of this automation is basically a pile of workarounds on top of a fundamentally inappropriate design.
Mostly since the AWS craze started a decade ago, developers have gone away from Dedicated servers (which are actually cheaper, go figure), which is causing all this mess.
It's genuinely insane that many companies are designing a great amount of fallbacks... on the software level but almost none is thought on the hardware/infrastructure level, common-sense dictate that you should never host everything on a single provider.
I tried as hard as I could to stay self hosted (and my backend is, still), but getting constant DDoS attacks and not having the time to deal with fighting them 2-3x a month was what ultimately forced me to Cloudflare. It's still worse than before even with their layers of protection, and now I get to watch my site be down a while, with no ability to switch DNS to point back to my own proxy layer, since CF is down :/
With the state of constant attack from AI scrapers and DDOS bots, you pretty much need to have a CDN from someone now, if you have a serious business service. The poor guys with single prem boxes with static HTML can /maybe/ weather some of this storm alone but not everything.
I self hosted on one of the company’s servers back in the late 90s. Hard drive crashes (and a hack once, through an Apache bug) had our services (http, pop, smtp, nfs, smb, etc ) down for at least 2-3 days (full reinstall, reconfiguration, etc).
Then, with regular VPSs I also had systems down for 1-2 days. Just last week the company that hosts NextCloud for us was down the whole weekend (from Friday evening) and we couldn’t get their attention until Monday.
So far these huge outages that last 2-5 hours are still lower impact for me, and require me to take less action.
I like the idea of having my own rack in a data center somewhere (or sharing the rack, whatever) but even a tiny cost is still more than free. And even then, that data center will also have outages, with none of the benefits of a Cloudflare Pages, GitHub Pages, etc.
> developers have gone away from Dedicated servers (which are actually cheaper, go figure)
It depends on how you calculate your cost. If you only include the physical infrastructure having a dedicated server is cheaper. But by having some dedicated server you loose a lot of flexibility. Needs more resources? Just scale up your ec2, and with a dedicated server there is a lot more work involved.
Do you want a 'production-ready' database? With AWS you can just click a few buttons and have a RDS ready to use. To roll out your own PG installation you need someone with a lot of knowledge(how to configure replication? backups? updates? ...).
So if you include salaries in the calculation the result changes a lot. And even if you already have some experts in your payroll by putting them to work in deploying a PG instance you won't be able to use them to build other things that may generate more value to you business than the premium you pay to AWS.
Cloud-Hoster are that hardware-fallback. They started with offering better redundancy and scaling than your homemade breadbox. But it seems they lost something along the way and now we have this.
Maintainance cost is the main issue for on-prem infra, nowadays add things like DDOS protection and/or scraping protection, which can require dedicated team or for your company to rely on some library or open source project that is not guaranteed to be maintained forever (unless you give them support, which i believe in)... Yeah I can understand why companies shift off of on-prem nowadays
... dedis are cheaper if you are rightsized. If you are wrongsize they just plain crash and you may or may not be able to afford the upgrade.
I was at Softlayer before I was at AWS and what catalyzed the move was the time I needed to add another hard drive to a system and somehow they screwed it up. I couldn't put a trouble ticket it to get it fixed because my database record in their trouble ticket system was corrupted. The next day I moved my stuff to AWS and the day after that they had a top sales guy talk to me to try to get me to stay but it was too late.
You jest, but this actually does exist. Multiple CDNs sell multi-CDN load balancing (divide traffic between 2+ CDNs per variously-complicated specifications, with failover) as a value add feature, and IIRC there is at least one company for which this is the marquee feature. It's also relatively doable in-house as these things go.
This might sound crazy as a software engineer, but I actually like the occasional "snow day" where everything goes down. It's healthy for us to all disconnect from the internet for a bit. The centralization unintentionally helps facilitate that. At least, that's my glass half full perspective.
I can understand that sentiment. Just don't lose sight of the impact it can have on every day people. My wife and I own a small theatre and we sell tickets through Eventbrite. It's not my full time job but it is hers. Eventbrite sent out an email this morning letting us know that they are impacted by the outage. Our event page appears to be working but I do wonder if it's impacting ticket sales for this weekend's shows.
So while us in tech might like a "snow day", there are millions of small businesses and people trying to go about their day to day lives who get cut off because of someone else's fuck-ups when this happens.
> This might sound crazy as a software engineer, but I actually like the occasional "snow day" where everything goes down
As as software engineer, I get it. as a CTO, I spent this morning triaging with my devops ai(actual Indian) to find some workaround (we found one) while our CEO was doing damage control with customers (non technical field) who were angry that we were down and they were losing business by the minute.
sometimes I miss not having a direct stake in the success of the business.
I'm guessing you're employed and your salary is guaranteed regardless. Would you have the same outlook if you were the self-employed founder of an online business and every minute of outage was costing you money?
except, yknow, where peoples lives and livelihoods depend on access to information/being able to do things on exact time. aws and cloudflare are disqualifying themselves from hospitals and military and whatnot.
How did we get to a place where Cloudflare being down means we see an outage page, but on that page it tells us explicitly that the host we're trying to connect to is up, and it's just a Cloudflare problem.
If it can tell us that the host is up, surely it can just bypass itself to route traffic.
People use CloudFlare because it's a "free" way for most sites to not get exploited (WAF) or DDoSed (CDN/proxy) regularly. A DDoS can cost quite a bit more than a day of downtime, even just a thundering herd of legitimate users can explode an egress bill.
It sucks there's not more competition in this space but CloudFlare isn't widely used for no reason.
AWS also solves real problems people have. Maintaining infrastructure is expensive as is hardware service and maintenance. Redundancy is even harder and more expensive. You can run a fairly inexpensive and performant system on AWS for years for the cost of a single co-located server.
It's not only centralization in the sense your website will be down if they are down but it is also a centralized MITM proxy. If you transfer sensitive data like chats over cloudflare-"protected" endpoints, you also allow CF to transparently read and analyze it in plain-text. It must be very easy for state agencies to spy on the internet nowadays, they woukd just ask CF to redirect traffic to them.
Because it's better to have a really convenient and cheap service that works 99% of the time, than a resilient that is more expensive or more cumbersome to use.
It's like github vs whatever else you can do with git that is truly decentralized. The centralization has such massive benefits that I'm very happy to pay the price of "when it's down I can't work".
Most developers don't care to know how the underlying infrastructure works (or why) and so they take whatever the public consensus is re: infra as a statement of fact (for the better part of the last 15 years or so that was "just use the cloud"). A shocking amount of technical decisions are socially, not technically enforced.
This topic is raised every time there is an outage with cloudflare and the truth of the matter is, they offer an incredible service, there is not a bit enough competition to deal with it. By definition their services are so good BECAUSE their adoption rate is so high.
It's very frustrating of course, and it's the nature of the beast.
Compliance. If you wanna sell your SAAS to big corpo, their compliance teams will feel you know what you're doing if they read AWS or Cloudflare on your architecture, even if you do not quite know what you're doing.
Because DDoS is a fact of life (and even if you aren't targeted by DDoS, the bot traffic probing you to see if you can be made part of the botnet is enough to take down a cheap $5 VPS). So we have to ask - why? Personally, I don't accept the hand-wavy explanation that botnets are "just a bunch of hacked IoT devices". No, your smart lightbulb isn't taking down Reddit. I slightly believe the secondary explanation that it's a bunch of hacked home routers. We know that home routers are full of things like suspicious oopsie definitely-not-government backdoors.
IMO, centralization is inevitable because the fundamental forces drive things in that direction. Clouds are useful for a variety of reasons (technical, time to market, economic), so developers want to use them. But clouds are expensive to build and operate, so there are only a few organizations with the budget and competency to do it well. So, as the market matures you end up with 3 to 5 major cloud operators per region, with another handful of smaller specialists. And that’s just the way it works. Fighting against that is to completely swim upstream with every market force in opposition.
There is this tendency to phrase questions (or statements) as
"when did 'we' ".
These decision are made individually not centrally. There is no process in place (and most likely there will never be) that will be able to control and dictate if people decide one way of doing things is the best way to do it. Even assuming they understand everything or know of the pitfalls.
Even if you can control individually what you do for the site you operate (or are involved in) you won't have any control on parts of your site (or business) that you rely on where others use AWS or Cloudflare.
I would be less worried if Cloudflare and AWS weren't involved in many more things than simply running DNS.
AWS - someone touches DynamoDB and it kills the DNS.
Cloudflare - someone touches functionality completely unrelated to DNS hosting and proxying and, naturally, it kills the DNS.
There is this critical infrastructure that just becomes one small part of a wider product offering, worked on by many hands, and this critical infrastructure gets taken down by what is essentially a side-effect.
It's a strong argument to move to providers that just do one thing and do it well.
Re: Cloudflare it is because developers actively pushed "just use Cloudflare" again and again and again.
It has been dead to me since the SSL cache vulnerability thing and the arrogance with which senior people expected others to solve their problems.
But consider how many people still do stupid things like use the default CDN offered by some third party library, or use google fonts directly; people are lazy and don't care.
We take the idea of the internet always being on for granted. Most people don’t understand the stack and assume that when sites go down it’s isolated, and although I agree with you, it’s just as much complacency and lack of oversight and enforcement delays in bureaucracy as it is centralization. But I guess that’s kind of the umbrella to those things… lol
Well the centralisation without rapid recovery and practices that provide substantial resiliency… that would be worrying.
But I dare say the folks at these organisations take these matters incredibly seriously and the centralisation problem is largely one of risk efficiency.
I think there is no excuse, however, to not have multi region on state, and pilot light architectures just in case.
A lot (and I mean a lot) of people in IT like centralization specifically because it’s hard to blame people for doing something that everyone else is doing.
This was always the case. There was always a "us-east" in some capacity, under Equinix, etc. Except it used to be the only "zone," which is why the internet is still so brittle despite having multiple zones. People need to build out support for different zones. Old habits die hard, I guess.
> How did we get to a place where either Cloudflare or AWS having an outage means a large part of the web going down?
As always, in the name of "security". When are we going to learn that anything done, either by the government or by a corporation, in the name of security is always bad for the average person?
It's weird to think about so bear with me. I don't mean this sardonically or misanthropically. But, it's "just the internet." It's just the internet. It dones't REALLY matter in a large enough macro view. It's JUST the internet.
It's because single points of traffic concentration are the most surveillable architecture, so FVEY et al economically reward with one hand those companies who would build the architecture they want to surveil with the other hand.
Currently at the public library and I can't use the customer inventory terminals to search for books. They're just a web browser interface to the public facing website, and it's hosted behind CF. Bananas.
Don't forget the CloudStrike outage: One company had a bug that brought down almost everything. Who would have thought there are so many single points of failure across the entire Internet.
For most services it's safer to host from behind Cloudflare, and Cloudflare is considered more highly available than a single IaaS or PaaS, at least in my headcanon.
The same reason we have centralization across the economy. Economies of scale is how you make a big business succesful, and once you are on top its hard to dislodge you.
Agreed. More worrying is that it appears standard practice or separation between domain and nameserver administration has been lost to one-stop-shop marketing.
And all of these outages happening not long after most of them dismissed a large amount of experienced staff while moving jobs offshore to save in labor costs.
Short-term economic forces, probably. Centralization is often cheaper in the near term. The cost of designing in single-point failure modes gets paid later.
I think some of the issues in the last outage actually affected multiple regions. IIRC internally some critical infrastructure for AWS depends on us-east-1 or at least it failed in a way that didn't allow failover.
5 mins. of thought to figure out why these services exist?
Dialogue about mitigations/solutions? Alternative services? High availability strategies?
Nah! It's free to complain.
Me personally, I'd say those companies do a phenomenal job by being a de facto backbone of the modern web. Also Cloudflare, in particular, gives me a lot of things for free.
It's not really. People are just very bad at putting the things around them into perspective.
Your power is provided by a power utility company. They usually serve an entire state, if not more than one (there are smaller ones too). That's "centralization" in that it's one company, and if they "go down", so do a lot of businesses. But actually it's not "centralized", in that 1) there are actually many different companies across the country/world, and 2) each company "decentralizes" most of its infrastructure to prevent massive outages.
And yes, power utilities have outages. But usually they are limited in scope and short-lived. They're so limited that most people don't notice when they happen, unless it's a giant weather system. Then if it's a (rare) large enough impact, people will say "we need to reform the power grid!". But later when they've calmed down, they realize that would be difficult to do without making things worse, and this event isn't common.
Large internet service providers like AWS, Cloudflare, etc, are basically internet utilities. Yes they are large, like power utilities. Yes they have outages, like power utilities. But the fact that a lot of the country uses them, isn't any worse than a lot of the country using a particular power company. And unlike the power companies, we're not really that dependent on internet service providers. You can't really change your power company; you can change an internet service provider.
Power didn't used to be as reliable as it is. Everything we have is incredibly new and modern. And as time has passed, we have learned how to deal with failures. Safety and reliability has increased throughout critical industries as we have learned to adapt to failures. But that doesn't mean there won't be failures, or that we can avoid them all.
We also have the freedom to architect our technology to work around outages. All the outages you have heard about recently could be worked around, if the people who built on them had tried:
- CDN goes down? Most people don't absolutely need a CDN. Point your DNS at your origins until the CDN comes back. (And obviously, your DNS provider shouldn't be the same as your CDN...)
- The control plane goes down on dynamic cloud APIs? Enable a "limp mode" that persists existing infrastructure to serve your core needs. You should be able to service most (if not all) of your business needs without constantly calling a control plane.
- An AZ or region goes down? Use your disaster recovery plan: deploy infrastructure-as-code into another region or AZ. Destroy it when the az/region comes back.
...and all of that just to avoid a few hours of downtime per year? It's likely cheaper to just take the downtime. But that doesn't stop people from piling on when things go wrong, questioning whether the existence of a utility is a good idea.
I have Cloudflare running in production and it is affecting us right now. But at least I know what is going on and how I can mitigate (e.g. disable Cloudflare as a proxy if it keeps affecting our services at skeeled).
Interestingly, also noticing that websites that use Cloudflare Challenge (aka "I'm not a Robot") are also throwing exceptions with a message as "Please unblock challenges.cloudflare.com to proceed" - even though it's just responding with an HTTP/500.
The state of error handling in general is woeful, they do anything to avoid admitting they're at fault so the negative screenshots don't end up on social media.
Blame the user or just leave them at an infinite spinning circle of death.
I check the network tab and find the backend is actually returning a reasonable error but the frontend just hides it.
Most recent one was a form saying my email was already in use, when the actual backend error returned was that the password was too long.
I think the site (front-end) thinks you have blocked the domain through DNS or an extension; and thus suggests you unblock it. It is unthinkable that Cloudflare captchas could go down /s.
I’d rather mitigate a DDoS attack on my own servers than deal with Cloudflare. Having to prove you’re human is the second-worst thing on my list, right after accepting cookies. Those two things alone have made browsing the web a worse experience than it was in the late 90s or early 2000s.
There's worse than having to prove (over and over and over again) that you are human: having your IP just completely blocked by Cloudflare zealous bot-filtering (and I use a plain mass market ISP in a developed country and not some shady network)
Alright kids, breathe...a DDoS attack isn't the end of the world, it's just the internet throwing a tantrum. If you really don't want to use a fancy protection provider, you can still act like a grown-up: get your datacenter to filter trash at the edge, announce a more specific prefix with BGP so you can shift traffic, drop junk with strict ACLs, and turn on basic rate limiting so bots get bored. You can also tune your kernel so it doesn't faint at SYN storms, and if the firehose gets too big, pop out a more specific BGP prefix from a backup path or secondary router so you can pull production away from the burning IP.
Worrying about a DDoS on your tiny setup is like a brand-new dev stressing over how they'll handle a billion requests per second...cute, but not exactly a real-world problem for 99.99% of you. It's one of those internet boogeyman myths people love to panic about.
As much as this situation sucks, how do you plan to "mitigate a DDoS attack on my own servers". The reason I use Cloudflare is to use it as a proxy especially for DDOS attacks if they do occur. Right now, our services are down and we are getting tons of customer support tickets (like everyone else) but it is a lot easier to explain the the whole world is down vs its just us.
> During our attempts to remediate, we have disabled WARP [their VPN service] access in London. Users in London trying to access the Internet via WARP will see a failure to connect.
Posted 4 minutes ago. Nov 18, 2025 - 13:04 UTC
> We have made changes that have allowed Cloudflare Access [their 'zero-trust network access solution'] and WARP to recover. Error levels for Access and WARP users have returned to pre-incident rates.
> We have re-enabled WARP access in London.
> We are continuing to work towards restoring other services.
> Posted 12 minutes ago. Nov 18, 2025 - 13:13 UTC
Now I'm really suspicious that they were attacked...
Someone running cloudflared accidentally advertising a critical route into their Warp namespace and somehow disrupting routes for internal Cloudflare services doesn't seem too far fetched.
I’ve written before on HN about when my employer hired several ex-FAANG people to manage all things cloud in our company.
Whenever there was an outage they would put up a fight against anyone wanting to update the status page to show the outage. They had so many excuses and reasons not to.
Eventually we figured out that they were planning to use the uptime figures for requesting raises and promos as they did at their FAANG employer, so anything that reduced that uptime number was to be avoided at all costs.
It's because if you automate it, something could/would happen to the little script that defines "uptime," and if that goes down, suddenly you're in violation of your SLA and all of your customers start demanding refunds/credits/etc. when everything is running fine.
Or let's say your load balancer croaks, triggering a "down" status, but it's 3am, so a single server is handling traffic just fine? In short, defining "down" in an automated way is just exposing internal tooling unnecessarily and generates more false positives than negatives.
Lastly, if you are allowed 45 minutes of downtime per year and it takes you an hour to manually update the status page, you just bought yourself an extra hour to figure out how to fix the problem before you have to start issuing refunds/credits.
>A spokesperson for Cloudflare said: “We saw a spike in unusual traffic to one of Cloudflare’s services beginning at 11.20am. That caused some traffic passing through Cloudflare’s network to experience errors. While most traffic for most services continued to flow as normal, there were elevated errors across multiple Cloudflare services.
>“We do not yet know the cause of the spike in unusual traffic. We are all hands on deck to make sure all traffic is served without errors. After that, we will turn our attention to investigating the cause of the unusual spike in traffic.”
"Unusual spike of traffic" can just be errant misconfiguration that causes traffic spikes just from TCP retries or the like. Jumping to "cyber attack" is eating up Hollywood drama.
In most cases, it's just cloud services eating shit from a bug.
I know this is bad, and some people's livelihood and lives rely on critical infrastructure, but when these things happen, I sometimes think GOOD!, let's all just take a breather for a minute yeh? Go outside.
One of the things that i didnt like about cloudflare MITM as a service is their requirement if you want SSL/CDN that you must use their DNS.
Overconcentration of infra within one single pint of disruption with no easy outs when the stack tips over.
Sadly i dont see any changes or rethink to be more decentralised even after this outage.
Yeah they keep re-inforcing bad vendor lockin practices. id guess the number of free users surpass the paying ones , and situations like these leave them all unable to recover.
Interesting(unnerving?) to see a number of domain registrars that offer their own DNS services utilize at least some kind of Cloudflare service for at least their own web fronts. Did a check on 6 registrar sites I currently interact with and half were down(Namecheap/Spaceship, Name, Dynadot) and up(Porkbun, Gandi, GoDaddy).
I just considered moving from Namecheap to Porkbun as Namecheap is down, but Porkbun use Cloudflare for their CAPTCHA meaning I'm unable to signup and I assume log in as well, so also no good.
Only true if your audience doesn't require Edge distribution, also if your Origin can handle the increased load and security issues, also if you don't use any advanced features (routing, edge compute...).
If your site is only hosted on one server and it catches fire, you can swiftly reinstall on a new server and change the IP your domain is pointing to, too... Still a single point of failure.
Just checked INWX from here in Germany. I was able to log in and get to my DNS records. Just if you should be looking for an alternative after all this.
Even if he blocked it by accident, that is not a reason to shout.
Shouting will not prevent errors, and you are only creating a hostile work environment where not acting is better than the risk of making a mistake and triggering an aggressive response from your part.
That's why I run my server on 7100 chips made for me by Sam Zeloof in his garage on a software stack hand coded by me, on copper I ran personally to everyone's house.
You are joking but working on making decentralization more viable would indeed be more healthy than throwing hands up and accepting Cloudflare as the only option.
There was an article on HN a few days back about how companies like this are influencing the overall freedom of the web (I missed the source) and their own way of doing things. Other examples of influence I see similarly are of Vercel, like with enterprise. Even a few days back, we saw AWS.
> Investigating - Cloudflare is aware of, and investigating an issue which potentially impacts multiple customers. Further detail will be provided as more information becomes available.
Things are back up (a second time) for me.
Cloudflare have updated their status page now to reflect the problems now. It doesn’t sound like they are confident the problem is fully fixed yet.
What would the Internet's architecture have to look like for DDOS'ing to be a thing of the past, and therefore Cloudflare to not be needed?
I know there are solutions like IPFS out there for doing distributed/decentralised static content distribution, but that seems like only part of the problem. There are obviously more types of operation that occur via the network -- e.g. transactions with single remote pieces of equipment etc, which by their nature cannot be decentralised.
Anyone know of research out there into changing the way that packet-routing/switching works so that 'DDOS' just isn't a thing? Of course I appreciate there are a lot of things to get right in that!
It's impossible to stop DDoS attacks because of the first "D".
If a botnet gets access through 500k IP addresses belonging to home users around the world, there's no way you could have prepared yourself ahead of time.
The only real solution is to drastically increase regulation around security updates for consumer hardware.
Maybe that's the case, but it seems like this conclusion is based on the current architecture of the internet. Maybe there are ways of changing it that mean these issues are not a thing!
What would that look like? A network with built-in rate & connection limiting?
The closest thing I can think of is the Gemini protocol browser. It uses TOFU for authentication, which requires a human to initially validate every interaction.
Works for static content and databases, but I don't think it works for applications where there is by necessity only one destination that can't be replicated (e.g. a door lock).
I got several emails from some uptime monitors I setup due to failing checks on my website and funnily enough I cannot log into any of them.
BetterStack, InStatus and HetrixTools seemingly all use Cloudflare on their dashboards, which means I can't login but I keep getting "your website/API is down" emails.
Update: I also can't login to UptimeRobot and Pulsetic. Now, I am getting seriously concerned about the sheer degree of centralization we have for CDNs/login turnstiles on Cloudflare.
In the beginning I thought my IP fell on the wrong side of Cloudflare and thought I was being blocked from ~80% of the internet. I was starting to panic
Sadly, I can report that this has brought down 2 of the major Mastodon nodes in the United Kingdom.
Happily, the small ones that I also use are still going without anyone apparently even noticing. At least, the subject has yet to reach their local timelines at the time that I write this.
2 of the other major U.K. nodes are still up, too.
Trying to figure out if this observation was intended to frame it so that it's less|same|more scary. The effect is more, but it sounds like the intention was less.
Update - The team is continuing to focus on restoring service post-fix. We are mitigating several issues that remain post-deployment.
Nov 18, 2025 - 15:40 UTC
It's hard not to use Cloudflare at least for me: good products, "free" for small projects, and if Cloudflare is down no one will blame you since the internet is down.
> if Cloudflare is down no one will blame you since the internet is down.
But this is not really the case. When Azure/AWS were down, same as this one with Cloudflare: significant amount of web was down but most of it was not. It just makes more obvious which provider you use.
Think about this rationally. If Cloudflare doesn't fix it within reasonable time, you can just point to different name servers and have your problem fixed in minutes.
So why be on Cloudflare to start with? Well, if you have a more reliable way then there's no reason. If you have a less reliable way, then you're on average better off with Cloudflare.
Well I can't change my NS since it's on Cloudflare too but besides that my personal opinion was not about this outage in particular but more the default approach of some websites that don't need all this tech (yes I really was out of groceries)
There’s certainly a business case for “which nines” after the talk of n nines. You ideally want to be available when your competitor, for instance, is not.
It's the web-scrapers. I run a tiny little mom and pop website, and the bots were consistently using up all of my servers' resources. Cloudfare more or less instantly resolved it.
I’ve been DDoS’d countless times running a small scale, uncontroversial SaaS. Without them I would’ve had countless downtime periods with really no other way to mitigate.
There's plenty of DDoS if you're dealing with people petty enough.
The VPS I use will nuke your instance if you run a game server. Not due to resource usage, but because it attracts DDoS like nothing else. Ban a teen for being an asshole and expect your service to be down for a week. And there isn't really Cloudflare for independent game servers. There's Steam Networking but it requires the developer to support it and of course Steam.
I was arrested by Interpol in 2018 because of warrants issued by the NCA, DOJ, FBI, J-CAT, and several other agencies, all due to my involvement in running a DDoS-for-hire website. Honestly, anyone can bypass Cloudflare, and anyone that want to take your website down - will take it down. It's just that luckily for all of us most of the DDoS-4-hire websites are down nowadays but there are still many botnets out there that will get past basically any protection and you can get access to them for basically $5.
There are plenty of alternatives to protect against DDoSing, people like convenience though. “Nobody gets fired for choosing Microsoft/Cloudflare”. We have a culture problem
It's not super common, but common enough that I don't want to deal with it.
The other part is just how convenient it is with CF. Easy to configure, plenty of power and cheap compared to the other big ones. If they made their dashboard and permission-system better (no easy way to tell what a token can do last I checked), I'd be even more of a fan.
If Germany's Telekom was forced to peer on DE-CIX, I'd always use CF. Since they aren't and CF doesn't pay for peering, it's a hard choice for Germany but an easy one everywhere else.
Honestly it kinda is. Ai bots scrape everything now, social media means you can go viral suddenly, or you make a post that angers someone and they launch an attack just because. I default to cloudflare, because like an umbrella I might just be carrying it around most of the time, but in the case of a sudden downpoor it's better than getting wet.
Setting up a replica and then pointing your api requests at it when cloudflare request fails is trivial. This way if you have a SPA and as long as your site/app is open the users won't notice.
The issue is DNS since DNS propagation takes time. Does anyone have any ideas here?
> Setting up a replica and then pointing your api requests at it when cloudflare request fails is trivial.
Only if you're doing very basic proxy stuff. If you stack multiple features and maybe even start using workers, there may be no 1:1 alternatives to switch to. And definitely not trivially.
The HN crowd in particular absolutely has a say in this, given the amount of engineering leads, managers, and even just regular programmers/admins/etc that frequent here - all of whom contribute to making these decisions.
You have the power to not host your own infrastructure on aws and behind cloudflare, or in the case of an employer you have the power to fight against the voices arguing for the unsustainable status quo.
We? I am not using it. I never used it and I will not use it. People should learn how to work with firewall, setup a simple ModSecurity WAF and stop using this bullshit. Almost everything goes through cloudflare and cloudflare also does TLS fronting for websites so basically cloudflare is MITM spying proxy but no one seem to care. :/
Cloudflare seems to have degrated performance. Half the requests for my site throw cloudflare 500x errors, the other half work fine.
However the https://www.cloudflarestatus.com/ does not really mention anything relevant. What's the point of having a status page if it lies ?
Update Ah I just checked the status and now I get a big red warning (however the problem existed for like 15 minutes before 11:48 UTC):
> Investigating - Cloudflare is aware of, and investigating an issue which potentially impacts multiple customers. Further detail will be provided as more information becomes available. Nov 18, 2025 - 11:48 UTC
> What's the point of having a status page if it lies ?
Status pages are basically marketing crap right now. The same thing happened with Azure where it took at least 45 minutes to show any change. They can't be trusted.
Please read my comment again including the update:
For 15 minute cloudflare wasn't working and the status page did not mentioned anything. Yes, right now the status page mentions the serious network problem but for some time our pages were not working and we didn't know what was happening.
So for ~ 15 minutes the status page lied. The whole point of a status page is to not lie, i.e to be updated automatically when there are problem and not by a person that needs to get clearance on what and how to write.
I didn’t see anyone comment this directly, but something these recent outages made me wonder, having spent a good chunk of my career in 24/7 tech support, is that I can’t even fathom the amount of people who have been:
- restarting their routers and computers instead of taking their morning shower, getting their morning coffee, taking their medication on time because they’re freaking out, etc.
- calling ISPs in a furious mood not knowing it’s a service in the stack and not the provider’s fault (maybe)
- being late for work in general
- getting into arguments with friends and family and coworkers about politics and economics
- being interrupted making their jerk chicken
Cloud in general was a mistake. We took a system explicitly designed for decentralization and resilience and centralized it and created a few neat points of failure to take the whole damn thing down.
Cloudflare provides some nice services that have nothing to do with cloud or not. You can self-host private tunnels, application firewalls, traffic filtering, etc, or you can focus on building your application and managing your servers.
I am a self-host enthousiast. So I use Hetzner, Kamal and other tools for self-managing our servers, but we still have Cloudflare in front of them because we didn't want to handle the parts I mentioned (yet, we might sometime).
Calling it a mistake is a very narrow look at it. Just because it goes down every now and then, it isn't a mistake. Going for cloud or not has its trade-offs and I agree that paying 200 dollars a month for a 1GB Heroku Redis instance is complete madness when you can get a 4GB VPS on Hetzner for 3,8 a month. Then again, some people are willing to make that trade-off for not having to manage the servers.
Cloud servers have taught me so much about working with servers because they are so easy and cheap to spin up, experiment with and then get rid of again. If I had had to buy racks and host them each time I wanted to try something, I would've never done it.
Sure, it's a great fair-weather technology, makes some things cheap and easy.
But in the face of adversity, it's a huge liability. Imagine Chinese Hackers taking down AWS, Cloudflare, Azure and GCP simultaneously in some future conflict. Imagine what that would do to the West.
I don't believe in Fukuyamas End of History. History is still happening, and the choices we make will determine how it plays out.
Thanks, I was too lazy to write this, and noticed this comment multiple times now. It's good to be sceptical at times, but in this case it simply misses the mark.
Threat actors (DDoS) and AI scraping already threw a wrench in decentralization. It's become quite difficult to host anything even marginally popular without robust infrastructure that can eat a lot of traffic
This is crazy. The internet has so much direct and transitive dependency on Cloudflare today. Pretty much the #1 dev slacking excuse today is no longer code compiling but cloudflare is down.
I can now imagine a scenario where everyone has become so dependent on the AI tool that it going down could turn into an unanticipated black start event for the entire internet.
I sense a great disturbance in the force... As if millions of cringefluencers suddenly cried out in terror cause they had to come up with an original thought.
It's insane to me that big internet uptime monitoring tools like Pingdom and Downdetector both seem to rely on Cloudflare, as both of those are currently unavailable as well.
The main bike rental Velib in Paris has the app not working, but the bikes can be taken with NFC. However, my station, which is always full at this time, is now empty, with only 2 bad bikes. It maybe related. Yet, push notifications are working.
I'm going to take the metro now and thinking how long do we have until the entire transit network goes down because of a similar incident.
Later today or tomorrow there's going to be a post on HN pointing to Cloudflare's RCA and multitudes here are going to praise CF for their transparency. Let's not forget that CF sucks and took half the internet down for four hours. Transparency or no, this should not be happening.
Alot of things shouldnt be happening. Fact is that no one forced half the internet to make CF their point of failure. The internet should ask themselves if that was the right call
Speaking of 5 9s, how would you achieve 5 9s for a basic CRUD app that doesn't need to scale, but still be globally accessible? No auth, micro services, email or 3rd party services. Just a classic backend connected to a db (any db tech, hosted wherever), that serves up some html.
You probably cannot achieve this with a single node, so you'll at least need to replicate it a few times to combat the normal 2-3 9s you get from a single node. But then you've got load balancers and dns, which can also serve as single point of failure, as seen with cloudflare.
Depending on the database type and choice, it varies. If you've got a single node of postgres, you can likely never achieve more than 2-3 9s (aws guarantees 3 9s for a multi-az RDS). But if you do multi-master cockroach etc, you can maybe achieve 5 9s just on the database layer, or using spanner. But you'll basically need to have 5 9s which means quite a bit of redundancy in all the layers going to and from your app and data. The database and DNS being the most difficult.
Reliable DNS provider with 5 9s of uptime guarantees -> multi-master load balancer each with 3 9s, -> each load balancer serving 3 or more apps each with 3 9s of availability, going to a database(s) with 5 9s.
This page from google shows their uptime guarantees for big tables, 3 9s for a single region with a cluster. 4 9s for multi cluster and 5 9s for multi region
Part of the up-time solution is keeping as much of your app and infrastructure within your control, rather than being at the behest of mega-providers as we've witnessed in the past month: Cloudflare, and AWS.
Probably:
- a couple of tower servers, running Linux or FreeBSD, backed up by a UPS and an auto-run generator with 24 hours worth of diesel (depending on where you are, and the local areas propensity for natural disasters - maybe 72 hours),
- Caddy for a reverse proxy, Apache for the web server, PostgreSQL for the database;
- behind a router with sensible security settings, that also can load-balance between the two servers (for availability rather than scaling);
- on static WAN IPs,
- with dual redundant (different ISPs/network provider) WAN connections,
- a regular and strictly followed patch and hardware maintenance cycle,
- located in an area resistant to wildfire, civil unrest, and riverine or coastal flooding.
I'd say that'd get you close to five 9s (no more than ~5 minutes downtime per year), though I'd pretty much guarantee five 9s (maybe even six 9s - no more than 32 seconds downtime per year) if the two machines were physically separated from each other by a few hundred kilometres, each with their own supporting infrastructure above, sans the load balancing (see below), through two separate network routes.
Load balancing would become human-driven in this 'physically separate' example (cheaper, less complex): if your-site-1.com fails, simply re-point your browser to your-site-2.com which routes to the other redundant server on a different network.
The hard part now will be picking network providers that don't use the same pipes/cables, i.e. they both use Cloudflare, or AWS...
Keep the WAN IPs written down in case DNS fails.
PostgreSQL can do master-master replication, but it's a pain to set up I understand.
what if you could create a super virtual server of sorts. imagine a new cloud provider like vercel but called something else. what this provider does is when you create a server on their service, they create 3 services, one on aws, one on gcp and one on azure. behind the scenes they are 3 separate servers but to the end user they are a single server. the end user gets to control how many cloud providers are involved. when aws goes down, no worries, it switches to the part with gcp on
I've been considering Cloudflare for caching, DDoS protection and WAF, but I don't like furthering the centralization of the Web. And my host (Vultr) has had fantastic uptime over the 10 years I've been on them.
How are others doing this? How is Hacker News hosted/protected?
I got an email saying that my OpenAI auto-renewal failed, my credits have run out. I go to OpenAI to reauthorize the card, and I can't login because OpenAI uses Cloudflare for "verifying you are a human" that goes in infinite loop. Great.
Phew, my latest 3h30 workshop about Obsidian was saved.
I recorded it this morning, not knowing about the Cloudflare issue (probably started while I was busy). I'm using Circle.so and they're down (my community site is now inaccessible). Luckily, they probably use AWS S3 or similar to host their files, so that part is still up and running.
Meanwhile all my sites are down. I'll just wait this one out, it's not the end of the world for me.
My GitHub actions are also down for one of my project because some third-party deps go through Cloudflare (Vulkan SDK). Just yesterday I was thinking to myself: "I don't like this dependency on that URL...". Now I like it even less
> A fix has been implemented and we believe the incident is now resolved. We are continuing to monitor for errors to ensure all services are back to normal. Posted 3 minutes ago. Nov 18, 2025 - 14:42 UTC
Seems like they think they've fixed it fully this time!
Close! They just updated their states and it's back to working on a fix
Update - Some customers may be still experiencing issues logging into or using the Cloudflare dashboard. We are working on a fix to resolve this, and continuing to monitor for any further issues.
Nov 18, 2025 - 14:57 UTC
I'm thinking about all those quips from a few decades back, along the lines of: "The Internet is resilient, it's distributed and it routes around damage" etc.
In many ways it's still true, but it doesn't feel like a given anymore.
Recently my multiple VPN server nodes just randomly cannot connect to cloudflare CDN IPs, from different provider VPS, while the Host Linux network does not have the issue; vpp share the same address with Linux and use tc stateless NAT to do the trick.
I finally work around this by change the tcp options sent by vpp tcp stack.
But the whole thing made me worry there must be something deployed which cause this issue.
But I do not think that related with this network issue, it just reminds me the above, I feel there are frequently new articles about cloudflare networking, maybe new method or new deployment sort of related high probability of issues
For anyone reading this who desperately needs their website up, you can try this: If you manage to get to your Cloudflare DNS settings and disable the "Proxy status (Proxied)" feature (the orange cloud), it should start working again.
Be aware that this change has a few immediate implications:
- SSL/TLS: You will likely lose your Cloudflare-provided SSL certificate. Your site will only work if your origin server has its own valid certificate.
- Security & Performance: You will lose the performance benefits (caching, minification, global edge network) and security protections (DDoS mitigation, WAF) that Cloudflare provides.
This will also reveal your backend internal IP addresses. Anyone can find permanent logs of public IP addresses used by even obscure domain names, so potential adversaries don't necessarily have to be paying attention at the exact right time to find it.
Unfortunately, this will also expose your IP address, which may leave you vulnerable even when the WAF and DDoS protections come back up (unless you take the time to only listen for Cloudflare IP address ranges, which could still take a beefy server if you're having to filter large amounts of traffic).
I think you should give me a credit for all the income I lost due to this outage. Who authorized a change to the core infrastructure during the period of the year when your customers make the most income? Seriously, this is a management failure at the highest levels of decision-making. We don't make any changes to our server infrastructure/stack during the busiest time of the year, and neither should you. If there were an alternative to Cloudflare, I'd leave your service and move my systems elsewhere.
Looking forward to seeing their RCA. I'm guessing it's going to be glossy in terms of actual customer impact. "We didn't go offline, we just had 100% errors. For 60 minutes."
My theory is that people's skills are getting worse. Attention spans are diminishing, memory is shrinking. People age and retire, new less skilled generations are replacing them. There are studies about declining IQ in the last decades. Probably mobile phones and social media are to blame.
We see the signs with Amazon and Cloudflare going down, Windows Update breaking stuff. But the worse is yet to come, and I am thinking about airport traffic control, nuclear power plants, surgeons...
> There are studies about declining IQ in the last decades. Probably mobile phones and social media are to blame.
It is much more nuanced than that.
The long-term rise (Flynn Effect) of IQs in the 20th century is widely believed to be driven by environmental factors more than genetics.
Plateau / decline is context-dependent: The reversal or slowdown isn’t universal, like you suggest. It seems more pronounced in certain countries or cohorts.
Cognitive abilities are diversifying: As people specialize more (education, careers, lifestyles), the structure of intelligence (how different cognitive skills relate) might be changing.
DigitalOcean + Gandi means nothing I run is down. Amazing. We depend far too greatly on centralised services where we deem the value of reputation and convenience exceeds the potential downsides and then the world pays for it. I think we have to feel a lot more of this pain before regulation kicks in to change things because the reality is people don't change. The only thing you can personally do is run a lot of your own stuff for things you can.
The sites I host on Cloudflare are all down. Also, even ChatGPT was down for a while, showing the error: "Please unblock challenges.cloudflare.com to proceed."
I happened to be working with Claude when this occurred. Having no idea what exactly what the cause was, I jumped over to GPT and observed the same. I did a dig challenges.cloudflare.com and by the time I'd figured out kind of what was happening, it seemed to have... resolved itself
I must say I'm astonished, as naive as it may be, to see the number of separate platforms affected by this. And it has been a bit of a learning experience too.
I didn't think about the Cloudflare API, but we'll make sure to do it next time. Hopefully, it won't happen again. I want Cloudflare to delegate DNS control to an external provider so it's easy to disable/enable the CF proxy in case something like this happens.
Yesterday I decided to finally write my makefiles to "mirror" (make available offline) the docs of the libraries I'm using. doc2dash for sphinx-enabled projects, and then using dash / zeal.
Then I was like... "when did I last time fly for 10+ hours and wanted to do programming, etc, so that I need offline docs?" So I gave up.
Today I can't browse the libs' docs quickly, so I'm resuming the work on my local mirroring :-)
This reminds me that I really like self-hosting. While it is true that many of things do not work, all my services do work. It has some tradeoffs of course.
There is an election in Denmark today, I wonder if this will affect that. The governments website is not accessible at the moment because it uses Cloudflare.
What do we actually lose going from cloud back to ground?
The mass centralization is a massive attack vector for organized attempts to disrupt business in the west.
But we’re not doing anything about it because we’ve made a mountain at of a molehill. Was it that hard to manage everything locally?
I get that there’s plenty of security implications going that route, but it would be much harder to bring down t large portions of online business with a single attack.
> What do we actually lose going from cloud back to ground?
A lot of money related to stuff you currently don't have to worry about.
I remember how shit worked before AWS. People don't remember how costly and time consuming this stuff used to be. We had close to 50 people in our local ops team back in the day when I was working with Nokia 13 years ago. They had to deal with data center outages, expensive storage solutions failing, network links between data centers, offices, firewalls, self hosted Jira running out of memory, and a lot of other crap that I don't spend a lot of time about worrying with a cloud based setup. Just a short list of stuff that repeatedly was an issue. Nice when it worked. But nowhere near five nines of uptime.
That ops team alone cost probably a few million per year in salaries alone. I knew some people in that team. Good solid people but it always seemed like a thankless and stressful job to me. Basically constant firefighting while getting people barking at you to just get stuff working. Later a lot of that stuff moved into AWS and things became a lot easier and the need for that team largely went away. The first few teams doing that caused a bit of controversy internally until management realized that those teams were saving money. Then that quickly turned around. And it wasn't like AWS was cheap. I worked in one of those teams. That entire ops team was replaced by 2-3 clued in devops people that were able to move a lot faster. Subsequent layoff rounds in Nokia hit internal IT and ops teams hard early on in the years leading up to the demise of the phone business.
Yeah, people have such short memories for this stuff. When we ran our own servers a couple of jobs ago, we had a rota of people who'd be on call for events like failing disks. I don't want to ever do that again.
In general, I'm much happier with the current status of "it all works" or "it's ALL broken and its someone else's job to fix it as fast as possible"!
Not saying its perfect but neither was on-prem/colocation
Strange thing is this is in multiple CD regions all using bot & WAF are down, just got a colueuge to check our site and both London & Singapour cloudflare servers are out... And I cant even login to the cloudflare dash to re-route critical traffic
. Likely this is accidental, but one day there will be something malicous that will have big impacts with how centralised the internet now is.
>Cloudflare is aware of, and investigating an issue which potentially impacts multiple customers. Further detail will be provided as more information becomes available.
I had two completely unrelated tabs open (https://twitter.com and https://onsensensei.com) both showing the same error. Opened another website, same error. Kinda funny to see how much of entire web is ran on CloudFlare nowadays.
Why do people use the reverse proxy functionality of Cloudflare? I've worked at small to medium sized businesses that never had any of this while running public facing websites and they were/are just fine.
Same goes for my personal projects: I've never been worried about being targeted by a botnet so much that I introduce a single point of failure like this.
Any project that starts gaining any bit of traction get's hammered with bots (the ones that try every single /wp url even tough you don't even use Wordpress), frequent DDoS attacks, and so on.
I consider my server's real IP (or load balancer IP) as a secret for that reason, and Cloudflare helps exactly with that.
Everything goes through Cloudflare, where we have rate limiters, Web firewall, challenges for China / Russian inbound requests (we are very local and have zero customers outside our country), and so on.
people think that running nodejs servers are a good idea, and those fall over if there's ever so much as a stiff breeze, so they put cloudflare in front and call it a day.
It gives really good caching functionality so you can have large amounts of traffic and your site can easily handle it. Plus they don't charge for egress traffic.
What exactly are you serving that bot traffic affects your quality of service?
I've seen an RPi serve a few dozen QPS of dynamic content without issue... The only service I've had actually get successfully taken down by benign bots is a Gitea-style git forges (which was 'fixed' by deploying Anubis in front of it).
Our national transit agency is apparently a customer.
The departure tables are borked, showing incorrect data, the route map stopped updating, the website and route planner are down, and the API returns garbage. Despite everything, the management will be pleased to know the ads kept on running offline.
Why would you put a WAP between devices you control and your own infra, God knows.
Is it me, or do the outages of single points of failure for large swaths of the internet tend to cluster within weeks/days of one another?
Anyone know why? Could be totally bias because one news story propels the next, so when they happen in clusters, you just hear about them more than when they don't.
The non profit I volunteer at is unreachable. It gives a cloudflare error page which is sort of helpful. It tells me the the site is ok but cloudflare has an 500.
It’s been great, but I always wonder when a company starts doing more than it’s initially calling. There have been a ton of large attacks, tons of bot scrappers so it’s the Wild West.
yes they're spreading themselves very thin with lots of new releases/products - but they will lose a lot of customers if their reliability comes into question
So they broke the internet. Nice!
Never seen so many sites not working.
Never seen so many desktop app suddenly stop working.
I don't want to be the person responsible for this.
And this again has thought me it's better to no rely on external services. Even though they seem to big to fail.
Down, but the linked status page shows mostly operational, except for "Support Portal Availability Issues" and planned maintenance. Since it was linked, I'm curious if others see differently.
edit: It now says "Cloudflare Global Network experiencing issues" but it took a while.
It would appear if you use a VPN in Europe you can still access Cloudflare sites, I have just tried, for me the Netherlands, Germany, and France work, but the UK and USA don't.
EDIT: It would appear it is still unreliable in these countries, it just stopped working in France for me.
Cloudflare Dashboard/Clicky clicky UI is down. I really appreciate that their API is still working. Small change in our Terraform configuration and now I can go lunch in peace knowing our clients at skeeled can keep working if wanted:
No logging in to Cloudflare Dash, no passing Turnstile (their CAPTCHA Replacement Solution) on third-party websites not proxied by Cloudflare, the rest that are proxied throwing 500 Internal server error saying it's Cloudflare's fault…
Linode has been rock solid for me. I wanted to back this comment with uptime numbers, unfortunately the service I use for that, Uptime Robot, is down because of Cloudflare...
Investigating - Cloudflare is aware of, and investigating an issue which potentially impacts multiple customers. Further detail will be provided as more information becomes available.
Nov 18, 2025 - 11:48 UTC
Yeah, those multiple customers is like 70% of the internet.
I would love to see a competition for the most banal thing that went wrong as a result of this. For example, I’m pretty sure the reason my IKEA locker wouldn’t latch shut was because the OS had hung while talking to a Cloudflare backend.
Cloudflare runs a high demand service, and the centralisation does deserve scrutiny. I think a good middle ground I’ll adopt is self hosting critical services and then when they have an outage redirect traffic to a Cloudflare outage banner.
Meanwhile my Wordpress blog on DigitalOcean is up. And so is DigitalOcean.
My ISP is routing public internet traffic to my IPs these days. What keeps me from running my blog from home? Fear of exposing a TCP port, that's what. What do we do about that?
Depending on the contract it might not be allowed to run public network services from your home network.
I had a friend doing that and once his site got popular the ISP called (or sent a letter? don't remember anymore) with "take this 10x more expensive corporate contract or we will block all this traffic".
In general why the ISPs don't want you to do that (in addition to way more expensive corporate rates) is the risk of someone DDoS that site which could cause issues to large parts of their domestic customers (and depending on the country be liable to compensate those customers for not providing a service they paid for)
> Our Engineering team is actively investigating an issue impacting multiple DigitalOcean services caused by an upstream provider incident. This disruption affects a subset of Gen AI tools, the App Platform, Load Balancer, Spaces and provisioning or management actions for new clusters. Existing clusters are not affected. Users may experience degraded performance or intermittent failures within these services.
> We acknowledge the inconvenience this may cause and are working diligently to restore normal operations. Signs of recovery are starting to appear, with most requests beginning to succeed. We will continue to monitor the situation closely and provide timely updates as more information becomes available. Thank you for your patience as we work towards full service restoration.
Yeah, DigitalOcean and Dreamhost are both up. I actually self-host on 2Gig fibre service, and all my stuff is up, except I park everything behind Cloudflare since there is no way I could handle a DDoS attack.
one way to mitigate DDoS is to enforce source IP checks on the way OUT of a datacenter (egress).
sure there are botnets, infected devices, etc that would conform to this but where does the sheer power of a big ddos attack come from? including those who sell it as a service. they have to have some infrastructure in some datacenter right?
make a law that forces every edge router of a datacenter to check for source IP and you would eliminate a very big portion of DDoS as we know it.
until then, the only real and effective method of mitigating a DDoS attack is with even more bandwidth. you are basically a black hole to the attack, which cloudflare basically is.
alright, what you are proposing is kind of hard to do.
Source routing is not easy, and source validations is even harder.
and what prevents me, as a abuse hoster or "bad guy" from just announcing my own IP space directly on a transit or IXP?
You might say, the IXP should do source checking aswell, but what if ipspace is distributed/anycasted across multiple ASN's/ on the IXP?
Also, if you add multiple egress points distributed across different routing domains, it gets complicated fast.
Does my transit upstream need to do source validation of my IP space? What about their upstream? Also, how would he know which IPspace belongs to which ASN's considering the allocation of ASN numbers and IP space is distributed across different organisations across the globe. (some of which are more malicious/non function than others[0]). Source routing becomes extremly complex because there is no single, universal mapping between IP space and ASN's they belong too.
The biggest attacks literally come from botnets. There’s not a lot coming from infrastructure services precisely because these services are incentivized to shut that shit down. At most it would be used as the control plane which is how people attempt to shut down the botnets.
We finally switched to CF a few weeks ago (for bot protection, abusive traffic started getting insane this year), finally we can join in on one of the global outage parties (no cloud usage otherwise, so still more uptime than most).
Because javascript programmers are cheaper/easier/whatever to hire? So everything becomes web-centric. (I'm hoping for this comment to be sarcastic but I wouldn't be surprised if it turns out not to be)
Didn't have my site on cloudflare bc it would be faster for chinese users (its main demographic) so i THOUGHT i was fine for a second until i remembered the data storage api is behind cloudflare
Hey, this is fun, all my websites are still up! I wonder how that happened? I don't even have to worry about my docker registry being down because I set up my own after the last global outage.
Is anybody keeping statistics on the frequency of these big global internet outages? It seems to be happening extremely frequently as of late, but it would be nice to have some data on that.
This Internet thing is steadily becoming the most fragile surface attack out there. No need for nuclear weapons anymore, just hit Cloudflare and AWS and we are back to the stone age.
We're on the enterprise plan, so far we're seeing Dashboard degradation and Turnstile (their captcha service) down. But all proxying/CDN and other services seem to work well.
Why are we seeing AWS, then Azure, then Cloudflare all going down just out of the blue? I know they go down occasionally, but it's typically not major outages like this...
Down... "Please unblock challenges.cloudflare.com to proceed." On every Cloudflare hosted website that I try. This timing SUCKS.......... please resolve fast! <3
Ah! Well, all of my websites are down! I’m going to take screenshots and have it as part of my Time Capsule Album, “Once upon a Time, my websites used to go down.”
If someone wanted to learn about how the modern infrastructure stack works, and why things like this occur, where would be some good resources to start?
I sometimes question my business decision to have a multi-cloud, multi-region web presence where it is totally acceptable to be down with the big boys.
Prior hosting provider was a little-known company with decent enough track record, but because they employed humans, stuff would break. When it did break, C-suite would panic about how much revenue is lost, etc.
The number of outages was "reasonable" to anyone who understood the technical side, but non-technical would complain for weeks after an outage about how we're always down, "well BigServiceX doesn't break ever, why do we?", and again lost revenue.
Now on Azure/Cloudflare, we go down when everyone else does, but C-Suite goes "oh it's not just us, and it's out of our control? Okay let us know when it fixes itself."
A great lesson in optics and perception, for our junior team members.
Haha they updated their status page: "Identified - A global upstream provider is currently experiencing an outage which is impacting platform-level and project-level services"
I assume the locations are operating fine, since you can see the error pages. The culprit here is probably the Network, which at the time of writing, shows up as offline
Windows 11 has some annoying UI decisions, but is otherwise 100% reliable for me and absolutely my OS of choice. Edge is essentially Chrome, but generally ties in better with the MS accounts ecosystem which I already use.
just yesterday cloudflare announced it was acquiring replicate (ai platform) "the Workers Platform mission: Our goal all along has been to enable developers to build full-stack applications without having to burden themselves with infrastructure" according to cloudflare's blog, are we cooked?
makes you realise, if cloudflare or one of these large organisations decides to (/ gets ordered by a deranged US president to) block your internet access, that's a whole lot of internet you're suddenly cut off from. Yes, i know there are circumventions, but its still a owrrying thought.
In theory even a single company service could be distributed, so only a fraction of websites would be affected, thus it's not a necessity to be a single point of failure. So I still don't like this argument "you see what happens when over half of the internet relies on Cloudflare". And yes, I'm writing this as a Cloudflare user whose blog is now down because of this. Cloudflare is still convenient and accessible for many people, no wonder why it's so popular.
But, yeah, it's still a horrible outage, much worse than the Amazon one.
The "omg centralized infra" cries after every such event kind of misses the point. Hosting with smaller companies (shared, vps, dedi, colo whatever) will likely result in far worse downtimes, individually.
Ofc the bigger perception issue here is many services going out at the same time, but why would (most) providers care if their annual downtime does or doesn't coincide with others? Their overall reliability is no better or worse had only their service gone down.
All of this can change ofc if this becomes a regular thing, the absolute hours of downtime does matter.
For fun, I asked google what's an alternative to Cloudflare. It says, "A complete list of Cloudflare alternatives depends on which specific service (CDN, security, Zero Trust, edge computing, etc.) you are replacing, as no single competitor offers the exact same all-in-one suite"
used a down-detector site to check if cloudflare is down, but the site is running on cloudflare, so i couldnt check if cloudflare was down for anyone else, because cloudflare was down
If a cloud vendor with 1 million users experiences a long term outage: the vendor has a serious problem. If a cloud vendor with 1 billion users experiences a long term outage: the internet has a serious problem. Yada-yada-yada xkcd/2347 but it's the big block in the middle which crumbled
Oh no, we can’t take a (former) executive to task about what they’ve wrought with their influence!!! That would be wrong.
If anything, he should be the first to be blamed for the greater and greater effect this tech monster has on internet stability, since, you know, his people built it.
When will Cloudflare actually split into several totally independent companies to remedy that they bring down the Internet every time they have a major issue?
I am using cloudflare as back-end for my site (workers) but have disabled all their other offerings. I was affected for a short while but seems to be less affected than other people.
The biggest learning for me from this incident - NEVER make your DNS provider and CDN provider the same vendor. Now, I can't login into the dashboard, even to switch the DNS. Sigh.
while my colleagues are wondering why cloudlfare isn't working and are afraid it might be something from us locally, I'll first check here to make sure it's not a Cloudflare / AWS problem in the first place.
It's the old IBM thing. If your website goes down along with everyone else's because of Cloudflare, you shrug and say "nothing we could do, we were following the industry standard". If your website goes down because of on-prem then it's very much your problem and maybe you get to look forward to an exciting debrief with your manager's manager.
That's lazy engineering and I don't think we as technical, rational people should make that our way of working. I know the saying, but I disagree with it. My fuckups, my problem, but at least I can avoid fuckups actively if I am in charge.
Funnily and ironically enough, I was trying to check out a few things on Ansible Galaxy and... I ended up here trying to submit the link for the CF ongoing incident
I would only consider doing stuff on-prem because of services like Cloudflare. You can have some of the global features like edge-caching while also getting the (cost) benefits of on-prem.
Well, between AWS US EAST 1 killing half the internet, and this incident, not even a month passed. Meanwhile, my physical servers don't care and happily serve many people at a cheaper cost than any cloud offer.
Update
We've deployed a change which has restored dashboard services. We are still working to remediate broad application services impact
Posted 2 minutes ago. Nov 18, 2025 - 14:34 UTC
but,..
I'm stuck at the captcha that does not work:
dash.cloudflare.com
Verifying you are human. This may take a few seconds.
dash.cloudflare.com needs to review the security of your connection before proceeding.
Half of the internet is down. That's what you get for giving up the control of the service that suppose to be decentralized to one company. Good, maybe if it costs companies few billions they will not put all eggs in one basket.
This seems to corroborate the recent controversial claims that American workers do not possess the aptitudes needed to succeed in the 21st century. If only we could have gotten more children to learn to code. Sigh.
I am paying for this shit service and this is my longest downtime I had in years. Can anyone recommend any other bottleneck to be annoyed with in future?
They are decentralized with servers all on the East coast that they self host. They do have points of failure that can take down the whole network, however.
I'm weary of the broader internet having spofs like AWS and Cloudflare. Can't change routing or DNS horizons to get around it. Things are just broken in ways that are not only opaque, but destructive due to so much relying on fragile sync state.
Will my Spelling Bee QBABM count today, or will it fail and tomorrow I find out that last MA(4) didn't register, ruining my streak? Society cannot function like this! /s
AWS, Azure, now Cloudflare, all within a month, are hit with configuration errors that are definitely neither signs of more surveillance gear being added by government agencies nor attacks by hostile powers. It's a shame that these fine services that everyone apparently needs and that worked so well for so long without a problem suddenly all have problems at the same time.
AWS was not a configuration error, it was a race condition on their load balancer's automated DNS record attribution that caused empty DNS records. As that issue was being fixed, it cascaded into further, more complex issues overloading EC2 instance provisioning.
Gemini is up, I asked it to explain what's going on in cave man speak:
YOU: Ask cave-chief for fire.
CAVE-CHIEF (Cloudflare): Big strong rock wall around many other cave fires (other websites). Good, fast wall!
MANY CAVE-PEOPLE: Shout at rock wall to get fire.
ROCK WALL: Suddenly… CRACK! Wall forgets which cave has which fire! Too many shouts!
RESULT:
Your Shout: Rock wall does not hear you, or sends you to wrong cave.
Other Caves (like X, big games): Fire is there, but wall is broken. Cannot get to fire.
ME (Gemini): My cave has my own wall! Not rock wall chief! So my fire is still burning! Good!
BIG PROBLEM: Big strong wall broke. Nobody gets fire fast. Wall chief must fix strong rock fast!
If anyone needs commands for turning off the CF proxy for their domains and happens to have a Cloudflare API token.
First you can grab the zone ID via:
And a list of DNS records using:
Each DNS record will have an ID associated. Finally patch the relevant records:
Copying from a sibling comment - some warnings:
- SSL/TLS: You will likely lose your Cloudflare-provided SSL certificate. Your site will only work if your origin server has its own valid certificate.
- Security & Performance: You will lose the performance benefits (caching, minification, global edge network) and security protections (DDoS mitigation, WAF) that Cloudflare provides.
- This will also reveal your backend internal IP addresses. Anyone can find permanent logs of public IP addresses used by even obscure domain names, so potential adversaries don't necessarily have to be paying attention at the exact right time to find it.
Also, for anyone who only has an old global API key lying around instead of the more recent tokens, you can set:
instead of the Bearer token header.
Edit: and in case you're like me and thought it would be clever to block all non-Cloudflare traffic hitting your origin... remember to disable that.
This is exactly what we've decided we should do next time. Unfortunately we didn't generate an API token so we are sitting twiddling our thumbs.
Edit: seems like we are back online!
Took me ~30 minutes but eventually I was able to log in, get past the 2FA screen and change a DNS record.
I surely missed a valid API token today.
3 replies →
Im able to generate keys right now through warp. Login takes forever but it is working.
Awesome! I did it via the Terraform provider, but for anyone else without access to the dashboard this is great. Thank you!
If anyone needs the internet to work again (or to get into your cf dashboard to generate API keys), if you have Cloudflare WARP installed, turning it on appears to fix otherwise broken sites. Maybe using 1.1.1.1 does too, but flipping the radio box was faster. Some parts of sites are still down, even after tunneling into to CF.
super helpful. thanks!
looks like i can get everywhere i couldn't except my cloudflare dash.
1 reply →
Good advice!
And no need for -X GET to make a GET request with curl, it is the default HTTP method if you don’t send any content.
If you do send content with say -d curl will do a POST request, so no need for -X then either.
For PATCH though, it is the right curl option.
thanks for this! just expanded on a bit and published a write up here so it's easier to find in the future: https://www.coryzue.com/writing/cloudflare-dns/
I would advise against this action. Just ride the crash.
If people knew how to play the 5 hour long game they wouldn't have been using Cloudflare in the first place.
[dead]
A colleague of mine just came bursting through my office door in a panic, thinking he brought our site down since this happened just as he made some changes to our Cloudflare config. He was pretty relieved to see this post.
Tell him it's worse than he thinks. He obviously brought the entire Cloudflare system down.
You joke and I think its funny, but as a junior engineer I would be quite proud if some small change I made was able to take down the mighty Cloudflare.
21 replies →
Well, you can never be sure that he didn't:
https://www.fastly.com/blog/summary-of-june-8-outage
It's also what was the cause of the Azure Front Doors global outage two weeks ago - https://aka.ms/air/YKYN-BWZ
"A specific sequence of customer configuration changes, performed across two different control plane build versions, resulted in incompatible customer configuration metadata being generated. These customer configuration changes themselves were valid and non-malicious – however they produced metadata that, when deployed to edge site servers, exposed a latent bug in the data plane. This incompatibility triggered a crash during asynchronous processing within the data plane service. This defect escaped detection due to a gap in our pre-production validation, since not all features are validated across different control plane build versions."
1 reply →
Oh don't you worry. We are very much talking about the global outage as if he was the root cause. Like good colleagues :)
3 replies →
> May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances.
I'd love to know more about what those specific circumstances were!
I'm pretty sure I crashed Gmail using something weird in its filters. It was a few years ago. Every time I did something specific (I don't remember what), it would freeze and then display a 502 error for a while.
Damn, imagine being the customer responsible for that, oof
3 replies →
Is there a word for that feeling of relief when someone else fucked up after initially thinking it was you?
What’s funny is as I get older this feeling of relief turns more like a feeling of dread. The nice thing about problems that you cause is that you have considerable autonomy to fix them. Cloudflare goes down you’re sitting and waiting for a 3 party to fix something.
9 replies →
The problem is, I still get the wrong end of the stick when AWS or CF go down! Management doesn't care, understandably. They just want the money to keep coming in. It's hard to convince them that this is a pretty big problem. The only thing that will calm them down a bit is to tell them Twitter is also down. If that doesn't get them, I say ChatGPT is also down. Now NOBODY will get any work done! lol.
5 replies →
When I'm debugging something, I'm not usually looking for the solution to the problem; I'm looking for sufficient evidence that I didn't cause the problem. Once I have that, the velocity at which I work slows down
1 reply →
phewphoria
10 replies →
Maybe this isn’t great, but I get a hint of that feeling when I’m on an airplane and hear a baby crying. For a number of years, if I heard a baby crying, it was probably my baby and I had to deal with it. But now my kids are past that phase, so when I hear the crying, after that initial jolt of panic I realize that it isn’t my problem, and that does give me the warm fuzzies. Even though I do feel bad for the baby and their parents.
3 replies →
The German word “schadenfreude” means taking pleasure in someone else’s misfortune; enjoyment rather than relief.
3 replies →
Schadenfriend?
You gain relief, but you don't exactly derive pleasure as it's someone you know that's getting the ass end of the deal
It's close enough to Schadenfreude but not really.
Schadenfreude
9 replies →
vindication?
schadenfuckup
The company where this colleague works? Cloudflare.
I woke up getting bombarded by multiple clients messages of sites not working, I shitted my pants because I've changed the config just yesterday. When I saw the status message "cloudflare down" I was so relieved.
Good that he worked it out so quick. I recently spent a day debugging email problems on Railway PaaS, because they silently closed an SMTP port without telling anyone.
How do we know your colleagues changes didn't take down Cloudflare though?
Good point. We should probably assume they did, until proven otherwise.
1 reply →
Do you guys work at Cloudflare? Do you mind reverting that change just in case?
Chances are still good that somewhere within Cloudflare someone really did do a global configuration push that brought down the internet.
When aliens study humans from this period, their book of fairy tales will include several where a terrible evil was triggered by a config push.
Plot twist: They work at Cloudflare
Even pornhub is down becuase it uses clouflare.
Is Cloudflare being down the work of conservative hackers and the rest of the internet is just collateral damage?
Wait for the post mortem ... It is a technical possibility, race condition propagates one customer config to all nodes... :-)
Did your colleague perhaps change the Cloudflare config again right now? Seems to be down again.
You should tell him his config change took down half the internet.
You missed a great opportunity to dead-pan him with something like "No, Bob, not just our site, you brought down the entire Internet, look at this post!"
> In short, a latent bug in a service underpinning our bot mitigation capability started to crash after a routine configuration change we made. That cascaded into a broad degradation to our network and other services. This was not an attack.
From the CTO, Source: https://x.com/dok2001/status/1990791419653484646
It still astounds me that the big dogs still do not phase config rollouts. Code is data, configs are data, they are one and the same. It was the same issue with the giant crowdstrike outage last year, they were rawdogging configs globally and a bad config made it out there and everything went kaboom.
You NEED to phase config rollouts like you phase code rollouts.
The big dogs absolutely do phase config rollouts as a general rule.
There are still two weaknesses:
1) Some configs are inherently global and cannot be phased. There's only one place to set them. E.g. if you run a webapp, this would be configs for the load balancer as opposed to configs for each webserver
2) Some configs have a cascading effect -- even though a config is applied to 1% of servers, it affects the other servers they interact with, and a bad thing spreads across the entire network
9 replies →
I think it's uncharitable to jump to the conclusion that just because there was a config-based outage they don't do phased config rollouts. And even more uncharitable to compare them to crowdstrike.
3 replies →
In a company I am no longer with I argued much the same when we rolled out "global CI/CD" on IAC. You made one change, committed and pushed, wham it's on 40+ server clusters globally. I hated it. The principal was enamored with it, "cattle not pets" and all that, but the result was things slowed down considerably because anyone working with it became so terrified of making big changes.
Then you get customer visible delays.
Because adversaries adapt quickly, they have a system that deploys their counter-adversary bits quickly without phasing - no matter whether they call them code or configs. See also: Crowdstrike.
You can't protect against _latent bugs_ with phased rollouts.
Wish this could rocket to the top of the comment thread, digging through hundreds of comments speculating about a cyberattack to find this felt silly
Configuration changes are dangerous for CF it seems, and knocked down $NET almost 4% today. I wonder what the industry wide impact is for each of these outages?
Pre market was red for all tech stocks today before the outage even happened
1 reply →
>Configuration changes are dangerous for CF it seems, and knocked down $NET almost 4% today. I wonder what the industry wide impact is for each of these outages?
This is becoming the "new normal." It seems like every few months, there's another "outage" that takes down vast swathes of internet properties, since they're all dependent on a few platforms and those platforms are, clearly, poorly run.
This isn't rocket surgery here. Strong change management, QA processes and active business continuity planning/infrastructure would likely have caught this (or not), as is clear from other large platforms that we don't even think about because outages are so rare.
Like airline reservations systems[0], credit card authorization systems from VISA/MasterCard, American Express, etc.
Those systems (and others) have outages in the "once a decade" or even much, much, longer ranges. Are the folks over at SABRE and American Express that much smarter and better than Cloudflare/AWS/Google Cloud/etc.? No. Not even close. What they are is careful as they know their business is dependent on making sure their customers can use their services anytime/anywhere, without issue.
It amazes me the level of "Stockholm Syndrome"[1] expressed by many posting to this thread, expressing relief that it wasn't "an attack" and essentially blaming themselves for not having the right tools (API keys, etc.) to recover from the gross incompetence of, this time at least, Cloudflare.
I don't doubt that I'll get lots of push back from folks claiming, "it's hard to do things at scale," and/or "there are way too many moving parts," and the like.
Other organizations like the ones I mention above don't screw they're customers every 4-6 months with (clearly) insufficiently tested configuration and infrastructure changes.
Yet many here seem to think that's fine, even though such outages are often crushing to their businesses. But if the customers of these huge providers don't demand better, they'll only get worse. And that's not (at least in my experience) a very deep or profound idea.
[0] https://en.wikipedia.org/wiki/Airline_reservations_system
[1] https://en.wikipedia.org/wiki/Stockholm_syndrome
Pretty much everything is down (checking from the Netherlands). The Cloudflare dashboard itself is experiencing an outage as well.
Not-so-funny thing is that the Betterstack dashboard is down but our status page hosted by Betterstack is up, and we can't access the dashboard to create an incident and let our customers know what's going on.
Edit: wording.
Yep that's also my experience. Except HN because it does not use *** Cloudflare because it knows it is not necessary. I just wrote a blog titled "Do Not Put Your Site Behind Cloudflare if You Don't Need To" [1].
[1]: https://huijzer.xyz/posts/123/
Sadly, AI bots and crawlers have made CF the only affordable way to actually keep my sites up without incurring excessive image serving costs.
Those TikTok AI crawlers were destroying some of my sites.
Millions of images served to ByteSpider bots, over and over again. They wouldn't stop. It was relentless abuse. :-(
Now I've just blocked them all with CF.
17 replies →
Yes, I never understand this obsession for centralized services like Cloudflare. To be fair though, if our tiny blogs anyway had a hundred or so visitors monthly, does it matter if it had an outage for a day?
8 replies →
~~two~~ three comments on that:
1. DDOS protection is not the only thing anymore, I use cloudflare because of vast amounts of AI bots from thousands of ASNs around the world crawling my CI servers (bloated Java VMs on very undersized hosts) and bringing them down (granted, I threw cloudflare onto my static sites as well which was not really necessary, I just liked their analytics UX)
2. the XKCD comic is mis-interpreted there, that little block is small because it's a "small open source project run by one person", cloudflare is the opposite of that
3. edit: also cloudflare is awesome if you are migrating hosts, did a migration this past month, you point cloudflare to the new servers and it's instant DNS propagation (since you didnt propagate anything :) )
2 replies →
Last time I tried this I got DDoS'd so I don't see a reason to step away from CF. That said, this is the price I pay
Does HN not experience DDOS? I would imagine being as popular as it is it'll experience DDOS.
2 replies →
It’s that time of the year again where we all realize that relying on AWS and Cloudflare to this degree is pretty dangerous but then again it’s difficult to switch at this point.
If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.
Unless you’re say at airport trying to file a luggage claim … or at the pharmacy trying to get your prescription. I think as a community we have a responsibility to do better than this.
7 replies →
> If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.
Which only shows that chasing five 9s is worthless for almost all web products. The idea is that by relying on AWS or Cloudflare you can push your uptime numbers up to that standard, but these companies themselves are having such frequent outages that customers themselves don't expect that kind reliability from web products.
> It’s that time of the year again
It's monthly by now
If I choose AWS/cloudflare and we're down with half of the internet, then I don't even need to explain it to my boss' bosses, because there will be an article in the mainstream media.
If I choose something else, we're down, and our competitors aren't, then my overlords will start asking a lot of questions.
6 replies →
Happy to hear anyone's suggestions about where else to go or what else to do in regards to protecting from large-scale volumetric DDoS attacks. Pretty much every CDN provider nowadays has stacked up enough capacity to tank these kind of attacks, good luck trying to combat these yourself these days?
26 replies →
Oh no, we had 30 minutes of downtime this year :(
12 replies →
Cloudflare dashboard is down-ish, not totally down. If you're persistent you can turn off the turnstile and proxy.
It took a few minutes but I got https://hcker.news off of it.
I can't sign in since Turnstile is down so I can't complete the captcha to log in.
I also can't log in via Google SSO since Cloudflare's SSO service is down.
1 reply →
Not saying not to do this to get through, but just as an observation, it’s also the sort of thing that can make these issues a nightmare to remediate, since the outage can actually draw more traffic just as things are warming up, from customers desperate to get through.
But then, that’s what Cloudflare signed up to be.
I'm already logged in on the cloudflare dashboard and trying to disable the CF proxy, but getting "404 | Either this page does not exist, or you do not have permission to access it" when trying to access the DNS configuration page.
1 reply →
I think there is a big business opportunity here. Make a site that let companies put their status update on local vps for $100.
Atlassian has this business model sewn up
https://www.atlassian.com/software/statuspage
2 replies →
Maybe that's precisely what Cloudflare did and now their status page is down because it's receiving an unusual amount of traffic that the VPS can't handle.
1 reply →
Even the Cloudflare status page, hosted by Atlassian Statuspage, is suffering. Probably due to the traffic crush.
Status pigeons.
on-demand status balancing!
Same here. We’re using OhDear. The status page is available but I can’t post an incident because their service is also behind Cloudflare.
Co-founder here, we'll be working on better ways to handle this over the coming days.
Update: our app is available again without Cloudflare, you'll be able to post updates to status pages smoothly again.
Could always just use a status page that updates itself. For my side project Total Real Returns [1], if you scroll down and look at the page footer, I have a live status/uptime widget [2] (just an <img> tag, no JS) which links to an externally-hosted status page [3]. Obviously not critical for a side project, but kind of neat, and was fun to build. :)
[1] https://totalrealreturns.com/
[2] https://status.heyoncall.com/svg/uptime/zCFGfCmjJN6XBX0pACYY...
[3] https://status.heyoncall.com/o/zCFGfCmjJN6XBX0pACYY
This is unrelated to the cloudflare incident but thanks a lot for making that page. I keep checking it from time to time and it's basically the main data source for my long term investing.
1 reply →
All my stuff is working. Things on GCP. Things on Fly.io. Tooling I use.
"Only" 10% of the internet is behind Cloudflare so far ;)
Happy for you :)
I am curious about these two things:
1- Does GCP also have any outages recently similar to AWS, Azure or CF? If a similar size (14 TB?) DDoS were to hit GCP, would it stand or would it fail?
2- If this DDoS was targeting Fly.io, would it stand? :)
3 replies →
Seems like workers are less affected and maybe betterstack has decided to bypass cloudflare "stuff" for the status pages? (maybe to cut down costs). My site is still up though some GitHub runners did show it failed at certain points.
I have a workers + kv app that seems fine right now.
1 reply →
BetterStack did report issues with some of their services, but they were not very informative.
When its back up, do yourself a favour and rent a $5/mo vps in another country from a provider like OVH or Hetzner and stick your status page on that.
"Yes but what if they go down" - it doesnt matter, having it hosted by someone who can be down for the same reason as your main product/service is a recipe for disaster.
Definitely. Tangentially, I encountered 504 Gateway Timeout errors on cloudflarestatus.com about an hour ago. The error page also disclosed the fact that it's powered by CloudFront (Amazon's CDN).
Or use a service like https://updown.io/ (I host my status page there).
https://cachethq.io/ is great for this
2 replies →
This is a big one.
Thankfully the usual social media are still up ... oh wait https://www.bbc.co.uk/news/articles/c629pny4gl7o
I don't get why you need such a service for a status page with 99.whatever% uptime. I mean, your status page only has to be up if everything else is down, so maybe 1% uptime is fine.
/s
There's something maliciously satisfying about seeing your own self-hosted stuff working while things behind Cloudflare or AWS are broken. Sure, they have like four more nines that me, but right now I'm sitting pretty.
My (s)crappy personal site was up during the AWS outage, Azure outage and now Cloud flare outage. And I have it for 2 months only! Maybe I can add a tracker somewhere, might be fun.
Can recommend Uptime Kuma for this purpose: https://github.com/louislam/uptime-kuma
4 replies →
My selfhosted sites are down because of the cloudflares proxy. Ugh
Only my 'www' is affected ie my blog. Other self hosted like jellyfin and vaultwarden work fine
How do you deal with DNS? I'm hosting something on a Raspberry Pi at home, and I had recently moved the DNS to Cloudflare. It's quite funny seeing my small personal website being down, although quite satisfying seeing both the browser and host with a green tick while Cloudflare is down.
> How do you deal with DNS?
DNS is actually one of the easiest services to self-host, and it's fairly tolerant of downtime due to caching. If you want redundancy/geographical distribution, Hurricane Electric has a free secondary/slave DNS service [0] where they'll automatically mirror your primary/master DNS server.
[0]: https://dns.he.net/
I don't have experience with a dynDNS setup like you describe, hosting from (probably) home. But my domains are on a VPS (and a few other places here and there) and DNS is done via my domain reseller's DNS settings pages.
Never had an issue hosting my stuff, but as said - don't yet have experience hoting something from home with a more dynamic DNS setup.
Pangolin is awesome. It's like self-hosted Cloudflare Tunnels
https://github.com/fosrl/pangolin
Host the DNS on the Pi too?
This is a real problem for some some “old-school enterprise” companies that use Oracle, SAP, etc. along with the new AWS/CF based services. They are all waiting around for new apps to come back up while their Oracle suite/SAP are still functioning. There is a lesson here for some of these new companies selling to old-school companies.
I was just able to save a proxied site. Then the dashboard went down again. I didn't even know it was still on. It's really not doing anything for performance because the traffic is quite low.
just a couple of days ago, I've moved my self hosted stuff from Cloudflare :)
Is it me or has there been a very noticeable uptick in large scale infra-level outages lately? AWS, Cloudflare, etc have all been way under whatever SLA they publish.
Coincidentally, large tech companies have been conducting mass layoffs and claim they're going to rely on AI much more to replace junior developers.
And they are offshoring roles to lower quality devs.
Interestingly, chatgpt was unavailable due to the same cloudflare outage.
1 reply →
By similar thinking, you could blame large tech companies if they hired too many juniors.
1 reply →
That does seem to be a coincidence, as the recent outages making headlines (including this one according to early reports) have been associated with huge traffic spikes. It seems DDoS are reaching a new level.
2 replies →
[dead]
For me the only silver lining to all these cloud outages is now we know that their published SLA times mean absolutely nothing. The number of 9's used to at least give an indication of intent of reliability, now they are twisted to whatever metric the company wants to represent and dont actually represent guaranteed uptime anywhere.
So true. AWS for example gives only platform credits in the event of an outage. Basically no recourse or insurance.
3 replies →
Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.
None of the recent major outages were traced down to "vibe coding" or anything of the sort. They appear to be the kind of misconfigurations and networking fuckups that existed since Internet became more complex than 3 routers.
7 replies →
Speaking of "vibe-coding", I wonder how much their own outage is affecting their ability to vibe-code their way out of it.. :-)
The openai login page says:
> Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.
Likely this coupled with the mass brain damage caused by never-ending COVID re-infections.
Since vaccines don't prevent transmission, and each re-infection increases the chances of long COVID complications, the only real protection right now is wearing a proper respirator everywhere you go, and basically nobody is doing that anymore.
10 replies →
The theory I’ve heard is holiday deploy freezes coupled with Q4 goals creates pressure to get things in quickly and early. It’s all been in the last month or so which does line up.
What's different about this Q4 vs the last 20 years of Q4s?
The obvious answer is to cancel holidays.
My theory is a state-sponsored actor targeting some of these services, but maybe that's just too 'tinfoil hat' of me, who knows.
There are usually very comprehensive post mortems for these events, and none have suggested that at all
This only amplifies the often-repeated propaganda about the "very powerful" enemies of democracy, who in fact are very fragile dictatorships. There's enough incompetence at tech companies to f up their own stuff.
My theory is DNS.
Somewhere, at a floating desk behind a wall of lava lamps, in a nyancatified ghostty terminal with 32 different shader plugins installed:
You're absolutely right! I shouldn't have force pushed that change to master. Let me try and roll it back. * Confrobulating* Oh no! Cloudflare appears to be down and I cannot revert the change. Why don't you go make a cup of coffee until that comes back. This code is production ready, it's probably just a blip.
If it's any guidance, US cyber risk insurance (which covers among other things disruptions due to supplier outages) has continuously dropped in price since Q1 2023, with a handful of percent per year.
If you excuse the sloppy plot manually transcribed from market index data: https://i.xkqr.org/cyberinsurancecost.png
Don't forget Azure Front Door / half of Azure.
Yeah, but that's just standard for Azure.
I suspect the number of outages is the same, but the number of sites putting all of their eggs into these two baskets has grown considerably.
Unless you're making that determination statistically, it's probably pereidolia. See here: https://behavioralscientist.org/yates-expect-unexpected-why-...
It's you. Everything does down once in a while.
GCP was down recently as well
Well AWS runs on Cloudflare...so thanks Cloudflare team!
Don’t forget that Azure was down two weeks ago as well.
Any chance our friend Vladamir is behind this?
it definitely feels like it.
Ironically, DownDetector seems to be down because it protects its site with Cloudflare Turnstile... which is also down!
I noticed this too!
The report there for AWS also skyrocketed, but I guess it's probably false positives?
Even many non tech people have begun to associate Internet wide outages with “aws must be down” so I imagine many of them searching “is aws down” and for down detector, a hit is a down report, so it will report aws impacts even when the culprit is cloudflare in this case
1 reply →
How did we get to a place where either Cloudflare or AWS having an outage means a large part of the web going down? This centralization is very worrying.
Because no one cares enough, including users.
Oddly this centralization allows a complete deferral of blame without you even doing anything: if you’re down, that’s bad. But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.
It also reduces your incentive to change, if “the internet is down” people will put down their device and do something else. Even if your web site is up they’ll assume it isn’t.
I’m not saying this is a good thing but I’m simply being realistic about why we ended up where we are.
As a user I do care, because I waste so much time on Cloudflare's "prove you are human" blocking-page (why do I have to prove it over and over again?), and frequently run on websites blocking me entirely based on some bad IP-blacklist used along with Cloudflare.
62 replies →
This is essentially the entire IT excuse for going to anything cloud. I see IT engineers all the time justifying that the downtime stops being their problem and they stop being to blame for it. There's zero personal responsibility in trying to preserve service, because it isn't "their problem" anymore. Anyone who thinks the cloud makes service more reliable is absolutely kidding themselves, because everyone who made the decision to go that way already knows it isn't true, it just won't be their problem to fix it.
If anyone in the industry actually cared about reliability and took personal stake in their system being up, everyone would be back on-prem.
14 replies →
Users have no options because... everything has been centralized. So it doesn't matter if users care or not.
Users are never a consideration today anyway.
11 replies →
There is an upside too. Us humans, we also need our down time occasionally.
20 replies →
Who cares if a couple of websites are down a day or even two?
As long as HN is up and running, everything is going to be O.K.!
8 replies →
> But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.
In my direct experience, this isn't true if you're running something even vaguely mission-critical for your customers. Your customer's workers just know that they can't do their job for the day, and your customer's management just knows that the solution they shepherded through their organization is failing.
3 replies →
> if “the internet is down” people will put down their device and do something else
In this case, the internet should be down more often.
1 reply →
100% this. While in my professional capacity I'm all in for reliability and redundancy, as an individual, I quite like these situations when it's obvious that I won't be getting any work done and it's out of my control, so I can go run some errands to or read a book, or just finish early.
> if “the internet is down” people will put down their device and do something else.
oh no
Which "user" are you referring to? Cloudflare users or end product users?
End product users have no power, they can complain to support and maybe get a free month of service, but the 0.1% of customers that do that aren't going to turn the tide and have anything change.
Engineering teams using these services also get "covered" by them - they can finger point and say "everyone else was down too."
Many people care, but none of them can (sufficiently) change the underlying incentive structure to effect the necessary changes.
> if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.
Which changes nothing to you actually being down, youre only down more. CF proxies always sucked - not your domain, not your domain...
But Spotify was not down. One social media was down.
This:
> if you’re down, that’s bad. But if you’re down, Spotify is down, social media is down… then “the internet is broken” and you don’t look so bad.
is just marketing. If you are down with some other websites it is still bad.
3 replies →
> Because no one cares enough, including users.
When have users been asked about anything?
On the other hand, it is cool to be up when the internet is down
Eh? It's because they are offering a service too good to refuse.
The internet this day is fucking dangerous and murderous as hell. We need Cloudflare just to keep services up due to the deluge of AI data scrapers and other garbage.
> Because no one cares enough, including users.
this is like a bad motivational speaker talk.. heavy exhortations with a dramatic lack of actual reasoning.
Systems are difficult, people. It is "incentives" of parties and lockin by tech design and vendors, not lack of individual effort.
Also it's free (the basic domain protection offered by CF anyway)
More like "don't have choice". It's not like service provider gonna go to competition, because before you switch, it will be back.
Frankly it's a blessing, always being able to blame the cloud that management forced company to migrate to be "cheaper" (which half of the time turns out to be false anyway)
> It also reduces your incentive to change, if “the internet is down” people will put down their device and do something else. Even if your web site is up they’ll assume it isn’t.
I agree. When people talk about the enshittification of the internet, Cloudflare plays a significant role.
[dead]
Many reasons but DDoS protection has massive network effects. The more customers you have (and therefore bandwidth provision) the easier it is to hold up against a DDoS, as DDoS are targeting just one (usually) customer.
So there are massive economies of scale. Small CDN with (say) 10,000 customers and 10mbit/sec per customer can handle 100gbit/s DDoS (way too simplistic, but hopefully you get the idea) - way too small.
If you have the same traffic provisioned on average per customer and have 1 million customers, you can handle a DDoS 100x the size.
Only way to compete with this is to massively overprovision bandwidth per customer (which is expensive, as those customers won't pay more just for you to have more redundancy because you are smaller).
In a way (like many things in infrastructure) CDNs are natural monopolies. The bigger you get -> the more bandwidth and PoP you can have -> more attractive to more customers (this repeats over and over).
It was probably very astute of Cloudflare to realise that offering such a generous free plan was a key step in this.
Your argument is technically flawed.
In a CDN, customers consume bandwidth; they do not contribute it. If Cloudflare adds 1 million free customers, they do not magically acquire 1 million extra pipes to the internet backbone. They acquire 1 million new liabilities that require more infrastructure investment.
All you are doing is echoing their pitch book. Of course they want to skim their share of the pie.
7 replies →
In my opinion, DDoS is possible only because there is no network protocol for a host to control traffic filtering on upstream providers (deny traffic from certain subnets or countries). In this case everybody would prefer write their own systems rather than rely on a harmful monopoly.
18 replies →
And how many companies want to also be able to build out their own CDN?
Not every company can be an expert at everything.
But perhaps many of us could buy a different CDN than the major players if we want to reduce the likelihood of mass outages like this though.
Yeah, I went to HN after the third web page didn't work. I am not just worried about the single point of failure, I am much more worried about this centralization eventually shaping the future standards of the web and making it de facto impossible to self-host anything.
Well that and the fact that when 99% goes through a central party, then that central party will be very interesting for authoritarian governments to apply sweeping censorship rules to.
It is already nearly impossible/very expensive in my country to be able to get a public IP address (Even IPv6) which you could host on. World is heavily moving towards centrally dependant on these big Cloud providers.
1 reply →
> eventually shaping the future standards of the web and making it de facto impossible to self-host anything
Eventually?
Another one that worries me is Let's Encrypt.
It is not as bad as Cloudflare or AWS because certificates will not expire the instant there is an outage, but considers that:
- It serves about 2/3 of all websites
- TLS is becoming more and more critical over time. If certificates fail, the web may as well be down
- Certificate lifetimes are becoming shorter and shorter, now 90 days, but Let's Encrypt is now considering 6 days, with 47 days being planned as a minimum
- An outage is one thing, but should a compromise happen, that would be even more catastrophic
Let's Encrypt is a good guy now, but remember that Google used to be a good guy in the 2000s too!
(Disclaimer: I am tech lead of Let's Encrypt software engineering)
I'm also concerned about LE being a single point of failure for the internet! I really wish there were other free and open CAs out there. Our goal is to encrypt the web, not to perpetuate ourselves.
That said, I'm not sure the line of reasoning here really holds up? There's a big difference between this three-hour outage and the multi-day outage that would be necessary to prevent certificate renewal, even with 6-day certs. And there's an even bigger difference between this sort of network disruption and the kind of compromise that would be necessary to take LE out permanently.
So while yes, I share your fear about the internet-wide impact of total Let's Encrypt collapse, I don't think that these situations are particularly analogous.
Agree, I’ve thought about this one too. The history of SSL/TLS certs is pretty hacky anyway in my opinion. The main problem they are solving really should have been solved at the network layer with ubiquitous IPsec and key distribution via DNS since most users just blindly trust whatever root CAs ship with their browser or OS, and the ecosystem has been full of implementation and operational issues.
Let’s Encrypt is great at making the existing system less painful, and there are a few alternatives like ZeroSSL, but all of this automation is basically a pile of workarounds on top of a fundamentally inappropriate design.
4 replies →
Google was always a for-profit operation. Let's Encrypt/ISRG could still go rotten but there are less incentives for them to do so as a non-profit.
Mostly since the AWS craze started a decade ago, developers have gone away from Dedicated servers (which are actually cheaper, go figure), which is causing all this mess.
It's genuinely insane that many companies are designing a great amount of fallbacks... on the software level but almost none is thought on the hardware/infrastructure level, common-sense dictate that you should never host everything on a single provider.
I tried as hard as I could to stay self hosted (and my backend is, still), but getting constant DDoS attacks and not having the time to deal with fighting them 2-3x a month was what ultimately forced me to Cloudflare. It's still worse than before even with their layers of protection, and now I get to watch my site be down a while, with no ability to switch DNS to point back to my own proxy layer, since CF is down :/
8 replies →
With the state of constant attack from AI scrapers and DDOS bots, you pretty much need to have a CDN from someone now, if you have a serious business service. The poor guys with single prem boxes with static HTML can /maybe/ weather some of this storm alone but not everything.
2 replies →
I self hosted on one of the company’s servers back in the late 90s. Hard drive crashes (and a hack once, through an Apache bug) had our services (http, pop, smtp, nfs, smb, etc ) down for at least 2-3 days (full reinstall, reconfiguration, etc).
Then, with regular VPSs I also had systems down for 1-2 days. Just last week the company that hosts NextCloud for us was down the whole weekend (from Friday evening) and we couldn’t get their attention until Monday.
So far these huge outages that last 2-5 hours are still lower impact for me, and require me to take less action.
1 reply →
I like the idea of having my own rack in a data center somewhere (or sharing the rack, whatever) but even a tiny cost is still more than free. And even then, that data center will also have outages, with none of the benefits of a Cloudflare Pages, GitHub Pages, etc.
> developers have gone away from Dedicated servers (which are actually cheaper, go figure)
It depends on how you calculate your cost. If you only include the physical infrastructure having a dedicated server is cheaper. But by having some dedicated server you loose a lot of flexibility. Needs more resources? Just scale up your ec2, and with a dedicated server there is a lot more work involved.
Do you want a 'production-ready' database? With AWS you can just click a few buttons and have a RDS ready to use. To roll out your own PG installation you need someone with a lot of knowledge(how to configure replication? backups? updates? ...).
So if you include salaries in the calculation the result changes a lot. And even if you already have some experts in your payroll by putting them to work in deploying a PG instance you won't be able to use them to build other things that may generate more value to you business than the premium you pay to AWS.
Cloud-Hoster are that hardware-fallback. They started with offering better redundancy and scaling than your homemade breadbox. But it seems they lost something along the way and now we have this.
Maintainance cost is the main issue for on-prem infra, nowadays add things like DDOS protection and/or scraping protection, which can require dedicated team or for your company to rely on some library or open source project that is not guaranteed to be maintained forever (unless you give them support, which i believe in)... Yeah I can understand why companies shift off of on-prem nowadays
... dedis are cheaper if you are rightsized. If you are wrongsize they just plain crash and you may or may not be able to afford the upgrade.
I was at Softlayer before I was at AWS and what catalyzed the move was the time I needed to add another hard drive to a system and somehow they screwed it up. I couldn't put a trouble ticket it to get it fixed because my database record in their trouble ticket system was corrupted. The next day I moved my stuff to AWS and the day after that they had a top sales guy talk to me to try to get me to stay but it was too late.
They're using cloudfare for multicloud, but still have cloudfare as a single point of failure. Should make a cloudfare for cloudfare to solve this.
Like the infamous "smiling through the pain" meme:
"I added a load-balancer to improve system reliability" (happy)
"Load balancer crashed" (smiling-through-the-pain)
1 reply →
You jest, but this actually does exist. Multiple CDNs sell multi-CDN load balancing (divide traffic between 2+ CDNs per variously-complicated specifications, with failover) as a value add feature, and IIRC there is at least one company for which this is the marquee feature. It's also relatively doable in-house as these things go.
Failover to Akamai.
1 reply →
If there’s clearly a single point of failure shouldn’t it be called a single cloud pretending to be “multicloud”?
This might sound crazy as a software engineer, but I actually like the occasional "snow day" where everything goes down. It's healthy for us to all disconnect from the internet for a bit. The centralization unintentionally helps facilitate that. At least, that's my glass half full perspective.
I can understand that sentiment. Just don't lose sight of the impact it can have on every day people. My wife and I own a small theatre and we sell tickets through Eventbrite. It's not my full time job but it is hers. Eventbrite sent out an email this morning letting us know that they are impacted by the outage. Our event page appears to be working but I do wonder if it's impacting ticket sales for this weekend's shows.
So while us in tech might like a "snow day", there are millions of small businesses and people trying to go about their day to day lives who get cut off because of someone else's fuck-ups when this happens.
1 reply →
> This might sound crazy as a software engineer, but I actually like the occasional "snow day" where everything goes down
As as software engineer, I get it. as a CTO, I spent this morning triaging with my devops ai(actual Indian) to find some workaround (we found one) while our CEO was doing damage control with customers (non technical field) who were angry that we were down and they were losing business by the minute.
sometimes I miss not having a direct stake in the success of the business.
I'm guessing you're employed and your salary is guaranteed regardless. Would you have the same outlook if you were the self-employed founder of an online business and every minute of outage was costing you money?
9 replies →
If the internet was just social media, SaaS productivity suites, and AI slop, sure...
But there are systems that depend on Cloudflare, directly or not, and when they go down it can have a serious impact on somebody's livelihood.
Now that network effects and data lock-in have taken root, downtime is not as big of a concern as it was in the 2000s
What does this even mean? Because people have locked in their data, they’re ok with downtime? I can’t imagine a world where this is true.
3 replies →
except, yknow, where peoples lives and livelihoods depend on access to information/being able to do things on exact time. aws and cloudflare are disqualifying themselves from hospitals and military and whatnot.
1 reply →
How did we get to a place where Cloudflare being down means we see an outage page, but on that page it tells us explicitly that the host we're trying to connect to is up, and it's just a Cloudflare problem.
If it can tell us that the host is up, surely it can just bypass itself to route traffic.
"... surely it can just ..."
Congratulations, you've successfully completed Management Training 101.
Totally cooked if you have Cloudflare fronting us-east-1, with no redundancies.
It could be worse. You could have a backup on Azure.
The mother of all bad infra decisions.
They have multi cloud infra, between us-east-1 and Azure
I recommend this Ben Thompson piece on why resiliency has declined: https://stratechery.com/2025/resiliency-and-scale/
People use CloudFlare because it's a "free" way for most sites to not get exploited (WAF) or DDoSed (CDN/proxy) regularly. A DDoS can cost quite a bit more than a day of downtime, even just a thundering herd of legitimate users can explode an egress bill.
It sucks there's not more competition in this space but CloudFlare isn't widely used for no reason.
AWS also solves real problems people have. Maintaining infrastructure is expensive as is hardware service and maintenance. Redundancy is even harder and more expensive. You can run a fairly inexpensive and performant system on AWS for years for the cost of a single co-located server.
Slowly and with full conscience of where we were heading to.
It's not only centralization in the sense your website will be down if they are down but it is also a centralized MITM proxy. If you transfer sensitive data like chats over cloudflare-"protected" endpoints, you also allow CF to transparently read and analyze it in plain-text. It must be very easy for state agencies to spy on the internet nowadays, they woukd just ask CF to redirect traffic to them.
Because it's better to have a really convenient and cheap service that works 99% of the time, than a resilient that is more expensive or more cumbersome to use.
It's like github vs whatever else you can do with git that is truly decentralized. The centralization has such massive benefits that I'm very happy to pay the price of "when it's down I can't work".
When there is an accident on the interstate we should blame the centralization of traffic and advocate for no more highways.
Very worrying indeed.
Most developers don't care to know how the underlying infrastructure works (or why) and so they take whatever the public consensus is re: infra as a statement of fact (for the better part of the last 15 years or so that was "just use the cloud"). A shocking amount of technical decisions are socially, not technically enforced.
This topic is raised every time there is an outage with cloudflare and the truth of the matter is, they offer an incredible service, there is not a bit enough competition to deal with it. By definition their services are so good BECAUSE their adoption rate is so high.
It's very frustrating of course, and it's the nature of the beast.
False dichotomy. Both can be true.
1 reply →
Compliance. If you wanna sell your SAAS to big corpo, their compliance teams will feel you know what you're doing if they read AWS or Cloudflare on your architecture, even if you do not quite know what you're doing.
Because DDoS is a fact of life (and even if you aren't targeted by DDoS, the bot traffic probing you to see if you can be made part of the botnet is enough to take down a cheap $5 VPS). So we have to ask - why? Personally, I don't accept the hand-wavy explanation that botnets are "just a bunch of hacked IoT devices". No, your smart lightbulb isn't taking down Reddit. I slightly believe the secondary explanation that it's a bunch of hacked home routers. We know that home routers are full of things like suspicious oopsie definitely-not-government backdoors.
IMO, centralization is inevitable because the fundamental forces drive things in that direction. Clouds are useful for a variety of reasons (technical, time to market, economic), so developers want to use them. But clouds are expensive to build and operate, so there are only a few organizations with the budget and competency to do it well. So, as the market matures you end up with 3 to 5 major cloud operators per region, with another handful of smaller specialists. And that’s just the way it works. Fighting against that is to completely swim upstream with every market force in opposition.
There is this tendency to phrase questions (or statements) as "when did 'we' ".
These decision are made individually not centrally. There is no process in place (and most likely there will never be) that will be able to control and dictate if people decide one way of doing things is the best way to do it. Even assuming they understand everything or know of the pitfalls.
Even if you can control individually what you do for the site you operate (or are involved in) you won't have any control on parts of your site (or business) that you rely on where others use AWS or Cloudflare.
I would be less worried if Cloudflare and AWS weren't involved in many more things than simply running DNS.
AWS - someone touches DynamoDB and it kills the DNS.
Cloudflare - someone touches functionality completely unrelated to DNS hosting and proxying and, naturally, it kills the DNS.
There is this critical infrastructure that just becomes one small part of a wider product offering, worked on by many hands, and this critical infrastructure gets taken down by what is essentially a side-effect.
It's a strong argument to move to providers that just do one thing and do it well.
Re: Cloudflare it is because developers actively pushed "just use Cloudflare" again and again and again.
It has been dead to me since the SSL cache vulnerability thing and the arrogance with which senior people expected others to solve their problems.
But consider how many people still do stupid things like use the default CDN offered by some third party library, or use google fonts directly; people are lazy and don't care.
Because they are great services, are generally pretty easy to get started with, and usually work as expected, which has led to broad adoption.
We take the idea of the internet always being on for granted. Most people don’t understand the stack and assume that when sites go down it’s isolated, and although I agree with you, it’s just as much complacency and lack of oversight and enforcement delays in bureaucracy as it is centralization. But I guess that’s kind of the umbrella to those things… lol
Well the centralisation without rapid recovery and practices that provide substantial resiliency… that would be worrying.
But I dare say the folks at these organisations take these matters incredibly seriously and the centralisation problem is largely one of risk efficiency.
I think there is no excuse, however, to not have multi region on state, and pilot light architectures just in case.
Except businesses love it.
A lot (and I mean a lot) of people in IT like centralization specifically because it’s hard to blame people for doing something that everyone else is doing.
And HN users love it too. I've had people on this site say how great it is that their system routes 30% of traffic on the internet.
I'd be horrified. That's not the internet or computing industries I grew up with, or started working in.
But as long as the SPY keeps hitting > 10% returns each year, everyone's happy.
"No one gets fired for buying IBM!"
1 reply →
This was always the case. There was always a "us-east" in some capacity, under Equinix, etc. Except it used to be the only "zone," which is why the internet is still so brittle despite having multiple zones. People need to build out support for different zones. Old habits die hard, I guess.
> How did we get to a place where either Cloudflare or AWS having an outage means a large part of the web going down?
As always, in the name of "security". When are we going to learn that anything done, either by the government or by a corporation, in the name of security is always bad for the average person?
It's weird to think about so bear with me. I don't mean this sardonically or misanthropically. But, it's "just the internet." It's just the internet. It dones't REALLY matter in a large enough macro view. It's JUST the internet.
What is worrying is that distributed systems don’t seem to be that distributed in practice.
Designed to survive a first strike from the USSR. Taken down by Cloudflare.
1 reply →
Don't think there is anything wrong with a centralised service being down, you just make a conscious decision if you want that and can afford that?
People not being ready for cloudflare/[insert hyperscaler] to be possibly down is the only fault.
It's because single points of traffic concentration are the most surveillable architecture, so FVEY et al economically reward with one hand those companies who would build the architecture they want to surveil with the other hand.
Currently at the public library and I can't use the customer inventory terminals to search for books. They're just a web browser interface to the public facing website, and it's hosted behind CF. Bananas.
Don't forget the CloudStrike outage: One company had a bug that brought down almost everything. Who would have thought there are so many single points of failure across the entire Internet.
For most services it's safer to host from behind Cloudflare, and Cloudflare is considered more highly available than a single IaaS or PaaS, at least in my headcanon.
The same reason we have centralization across the economy. Economies of scale is how you make a big business succesful, and once you are on top its hard to dislodge you.
Agreed. More worrying is that it appears standard practice or separation between domain and nameserver administration has been lost to one-stop-shop marketing.
And all of these outages happening not long after most of them dismissed a large amount of experienced staff while moving jobs offshore to save in labor costs.
Short-term economic forces, probably. Centralization is often cheaper in the near term. The cost of designing in single-point failure modes gets paid later.
The technical term for it is a man in the middle. It’s better to call it what it is that way you aren’t fooled into thinking it’s not, because it is.
Because bots are a real thing.
And it’s hard to protect against DDoS without something like Cloudflare.
Look at the posts here.
Even the meager HN “hug of death” will take things down
A lot of products use AWS because "we could build redundancy and multi-region if we need it" and then never build it.
I think some of the issues in the last outage actually affected multiple regions. IIRC internally some critical infrastructure for AWS depends on us-east-1 or at least it failed in a way that didn't allow failover.
How many more of these until governments step in and take over "critical infrastructure"?
Two ways. Gradually, then suddenly.
Consider joining the Internet Society. An entire group of people who care!
A key risk of monopolies is that they lead to monoculture SPoFs.
All decentralized systems tend to centralization over time.
because cloudfare protection blah blah, until cloudfare is down itself and then you are back to "who watches the watchmen"
That's easy, the watchmen watchmen watch the watchmen.
because efficiency trumps redundancy in the short term, which is all that matters in a super competitive environment.
Is avoiding single point of failure in anyone’s playbook? ¯\_(ツ)_/¯
We only care about it when it's time to complain about the work of individual people.
Companies can always do as they please and people will rationalize anything.
5 mins. of thought to figure out why these services exist?
Dialogue about mitigations/solutions? Alternative services? High availability strategies?
Nah! It's free to complain.
Me personally, I'd say those companies do a phenomenal job by being a de facto backbone of the modern web. Also Cloudflare, in particular, gives me a lot of things for free.
Hacking software or hardware is so old school.
The target these days is the user.
The make-believe worm.
…sneaks in Azure
It's not really. People are just very bad at putting the things around them into perspective.
Your power is provided by a power utility company. They usually serve an entire state, if not more than one (there are smaller ones too). That's "centralization" in that it's one company, and if they "go down", so do a lot of businesses. But actually it's not "centralized", in that 1) there are actually many different companies across the country/world, and 2) each company "decentralizes" most of its infrastructure to prevent massive outages.
And yes, power utilities have outages. But usually they are limited in scope and short-lived. They're so limited that most people don't notice when they happen, unless it's a giant weather system. Then if it's a (rare) large enough impact, people will say "we need to reform the power grid!". But later when they've calmed down, they realize that would be difficult to do without making things worse, and this event isn't common.
Large internet service providers like AWS, Cloudflare, etc, are basically internet utilities. Yes they are large, like power utilities. Yes they have outages, like power utilities. But the fact that a lot of the country uses them, isn't any worse than a lot of the country using a particular power company. And unlike the power companies, we're not really that dependent on internet service providers. You can't really change your power company; you can change an internet service provider.
Power didn't used to be as reliable as it is. Everything we have is incredibly new and modern. And as time has passed, we have learned how to deal with failures. Safety and reliability has increased throughout critical industries as we have learned to adapt to failures. But that doesn't mean there won't be failures, or that we can avoid them all.
We also have the freedom to architect our technology to work around outages. All the outages you have heard about recently could be worked around, if the people who built on them had tried:
- CDN goes down? Most people don't absolutely need a CDN. Point your DNS at your origins until the CDN comes back. (And obviously, your DNS provider shouldn't be the same as your CDN...)
- The control plane goes down on dynamic cloud APIs? Enable a "limp mode" that persists existing infrastructure to serve your core needs. You should be able to service most (if not all) of your business needs without constantly calling a control plane.
- An AZ or region goes down? Use your disaster recovery plan: deploy infrastructure-as-code into another region or AZ. Destroy it when the az/region comes back.
...and all of that just to avoid a few hours of downtime per year? It's likely cheaper to just take the downtime. But that doesn't stop people from piling on when things go wrong, questioning whether the existence of a utility is a good idea.
CAPITALISM
Are people really this confused?
[dead]
I do appreciate the visual "mea culpa":
Your browser: Working
Host: Working
Cloudflare: Error
Might be the first time I have ever seen that. Though in my case the "Host" is Cloudflare's own Pages service.
Yeah, I was shocked. Disbelief that the host was up, which is what usually happens when the cloudflare's page show up
They still blame the customers when you click on "Cloudflare":
> If the problem isn’t resolved in the next few minutes, it’s most likely an issue with the web server you were trying to reach.
In terms of probability looking at the history, it is correct. It's mostly me messing up with the web server.
I noticed that refreshing honesty too, not that the users did (our wifi is down fix it pls urgent)
That is really good to be honest!
I have Cloudflare running in production and it is affecting us right now. But at least I know what is going on and how I can mitigate (e.g. disable Cloudflare as a proxy if it keeps affecting our services at skeeled).
I searched my logs for errors for about an hour before figuring out the problem was not on my server :D
That page has special if/endif HTML comments to handle if your browser is IE 6, IE 7, IE 8...
And at the bottom:
What can I do?
Please try again in a few minutes.
Interestingly, also noticing that websites that use Cloudflare Challenge (aka "I'm not a Robot") are also throwing exceptions with a message as "Please unblock challenges.cloudflare.com to proceed" - even though it's just responding with an HTTP/500.
The state of error handling in general is woeful, they do anything to avoid admitting they're at fault so the negative screenshots don't end up on social media.
Blame the user or just leave them at an infinite spinning circle of death.
I check the network tab and find the backend is actually returning a reasonable error but the frontend just hides it.
Most recent one was a form saying my email was already in use, when the actual backend error returned was that the password was too long.
This takes down AI/search on chat.bing.com (GPT5, unauthenticated).
Funny, since I would have to prove to a an AI that I am human in the first place.
And others (ex. pinkbike) displaying "you have been blocked".
Always nice to see Pinkbike mentioned in the tech world :)
I think the site (front-end) thinks you have blocked the domain through DNS or an extension; and thus suggests you unblock it. It is unthinkable that Cloudflare captchas could go down /s.
Not only discriminating robots but actual people /s.
I’d rather mitigate a DDoS attack on my own servers than deal with Cloudflare. Having to prove you’re human is the second-worst thing on my list, right after accepting cookies. Those two things alone have made browsing the web a worse experience than it was in the late 90s or early 2000s.
There's worse than having to prove (over and over and over again) that you are human: having your IP just completely blocked by Cloudflare zealous bot-filtering (and I use a plain mass market ISP in a developed country and not some shady network)
Some of the mass-market ISPs are very shady - AT&T's Room 641A for example :)
How do you plan on mitigating a DDoS on your own servers?
Alright kids, breathe...a DDoS attack isn't the end of the world, it's just the internet throwing a tantrum. If you really don't want to use a fancy protection provider, you can still act like a grown-up: get your datacenter to filter trash at the edge, announce a more specific prefix with BGP so you can shift traffic, drop junk with strict ACLs, and turn on basic rate limiting so bots get bored. You can also tune your kernel so it doesn't faint at SYN storms, and if the firehose gets too big, pop out a more specific BGP prefix from a backup path or secondary router so you can pull production away from the burning IP.
4 replies →
Worrying about a DDoS on your tiny setup is like a brand-new dev stressing over how they'll handle a billion requests per second...cute, but not exactly a real-world problem for 99.99% of you. It's one of those internet boogeyman myths people love to panic about.
1 reply →
You turn off the screen. They can't hurt you if you don't see them
1 reply →
You wait for it to stop.
1 reply →
he'll politely ask them to stop
As much as this situation sucks, how do you plan to "mitigate a DDoS attack on my own servers". The reason I use Cloudflare is to use it as a proxy especially for DDOS attacks if they do occur. Right now, our services are down and we are getting tons of customer support tickets (like everyone else) but it is a lot easier to explain the the whole world is down vs its just us.
>it is a lot easier to explain the the whole world is down vs its just us.
Makes sense. The ability to pass the buck like this is 95% of the reason Cloudflare exists in the first place. Not being snarky, either.
> During our attempts to remediate, we have disabled WARP [their VPN service] access in London. Users in London trying to access the Internet via WARP will see a failure to connect. Posted 4 minutes ago. Nov 18, 2025 - 13:04 UTC
Is Cloudflare being attacked...?
This line also gave me that vibe
> We have made changes that have allowed Cloudflare Access [their 'zero-trust network access solution'] and WARP to recover. Error levels for Access and WARP users have returned to pre-incident rates. > We have re-enabled WARP access in London.
> We are continuing to work towards restoring other services. > Posted 12 minutes ago. Nov 18, 2025 - 13:13 UTC
Now I'm really suspicious that they were attacked...
I will bet it's routing misconfig.
1 reply →
[flagged]
Someone running cloudflared accidentally advertising a critical route into their Warp namespace and somehow disrupting routes for internal Cloudflare services doesn't seem too far fetched.
We vibe coded a tool to mass disconnect Cloudflare Warp for incident responders: https://github.com/aberoham/unwarp
To go along with the shenanigans around dealing with MITM traffic inspection https://github.com/aberoham/fuwarp
I used to say, "Don't worry, we host it on Cloudflare. If it's down, then 30% internet is down. It's highly unlikely."
Well...
Yes but you also get to say "We're down? Yes of course we're down, 30% of the internet is down. Nothing we can do"
Like the old saying: Nobody Ever Got Fired for Buying IBM.
You weren't wrong, but There Will Be Days Like This.
Three this year, so far.
Classic. I see issues. Vendor’s status page is all green. Go to HN to find the confirmation. Applies to AWS, GH, everyone.
Edit: beautiful, this decentralised design of the internet.
I get the feeling that all "serious" businesses have manual processes for publicly facing status pages, for political reasons.
I don't like it.
I’ve written before on HN about when my employer hired several ex-FAANG people to manage all things cloud in our company.
Whenever there was an outage they would put up a fight against anyone wanting to update the status page to show the outage. They had so many excuses and reasons not to.
Eventually we figured out that they were planning to use the uptime figures for requesting raises and promos as they did at their FAANG employer, so anything that reduced that uptime number was to be avoided at all costs.
3 replies →
It's because if you automate it, something could/would happen to the little script that defines "uptime," and if that goes down, suddenly you're in violation of your SLA and all of your customers start demanding refunds/credits/etc. when everything is running fine.
Or let's say your load balancer croaks, triggering a "down" status, but it's 3am, so a single server is handling traffic just fine? In short, defining "down" in an automated way is just exposing internal tooling unnecessarily and generates more false positives than negatives.
Lastly, if you are allowed 45 minutes of downtime per year and it takes you an hour to manually update the status page, you just bought yourself an extra hour to figure out how to fix the problem before you have to start issuing refunds/credits.
2 replies →
At some level, the status updates have to be manual. Any automation you try to build on top is inevitably going to break in a crisis situation.
5 replies →
SLA breaches have consequences, no big conspiracy there
3 replies →
FWIW, cloudflare's status page is showing red currently.
I usually get notifications from the sales/CS team way before the status page/incident list has any blip. This time was not an exception
It's as if they wanted an internet kill switch. /S
Quote from The Guardian's story:
>A spokesperson for Cloudflare said: “We saw a spike in unusual traffic to one of Cloudflare’s services beginning at 11.20am. That caused some traffic passing through Cloudflare’s network to experience errors. While most traffic for most services continued to flow as normal, there were elevated errors across multiple Cloudflare services.
>“We do not yet know the cause of the spike in unusual traffic. We are all hands on deck to make sure all traffic is served without errors. After that, we will turn our attention to investigating the cause of the unusual spike in traffic.”
https://www.theguardian.com/technology/2025/nov/18/cloudflar...
Sounds like it may have been a cyber attack...
"Unusual spike of traffic" can just be errant misconfiguration that causes traffic spikes just from TCP retries or the like. Jumping to "cyber attack" is eating up Hollywood drama.
In most cases, it's just cloud services eating shit from a bug.
1 reply →
That's not what I'm hearing from insiders
I went to check how many services are being impacted on down detector, but it was down.
I know this is bad, and some people's livelihood and lives rely on critical infrastructure, but when these things happen, I sometimes think GOOD!, let's all just take a breather for a minute yeh? Go outside.
Tried checking Cloudflare’s status on Downdetector, but Downdetector was also behind Cloudflare. Internet checkmate.
It’s not just websites :-/
Things like Apple private relay (which way too many people seem to have it enabled) are tunnelled via Cloudflare, maybe using warp?
One of the things that i didnt like about cloudflare MITM as a service is their requirement if you want SSL/CDN that you must use their DNS. Overconcentration of infra within one single pint of disruption with no easy outs when the stack tips over. Sadly i dont see any changes or rethink to be more decentralised even after this outage.
to be clear, that's just a limitation on their free service. If you pay, you can keep your own DNS
Their paid "professional" plan also has this limitation, only "enterprise" and up does not.
2 replies →
Yeah they keep re-inforcing bad vendor lockin practices. id guess the number of free users surpass the paying ones , and situations like these leave them all unable to recover.
Interesting(unnerving?) to see a number of domain registrars that offer their own DNS services utilize at least some kind of Cloudflare service for at least their own web fronts. Did a check on 6 registrar sites I currently interact with and half were down(Namecheap/Spaceship, Name, Dynadot) and up(Porkbun, Gandi, GoDaddy).
I just considered moving from Namecheap to Porkbun as Namecheap is down, but Porkbun use Cloudflare for their CAPTCHA meaning I'm unable to signup and I assume log in as well, so also no good.
Porkbun also uses Cloudflare for their NS servers.
And no lesson about single point of failure and centralization was learned that day.
[dead]
Where is the single point of failure? You can point to different name servers and swiftly remove Cloudflare from your setup.
Only true if your audience doesn't require Edge distribution, also if your Origin can handle the increased load and security issues, also if you don't use any advanced features (routing, edge compute...).
4 replies →
If your site is only hosted on one server and it catches fire, you can swiftly reinstall on a new server and change the IP your domain is pointing to, too... Still a single point of failure.
1 reply →
Bold of you to assume the service you use to manage your DNS was not also relying on Cloudflare just like you
Not if you’re using Workers/Pages!
But you didn't, so Cloudflare ended up being a single point of failure for half the internet.
1 reply →
This incident has been resolved. Posted 4 minutes ago. Nov 18, 2025 - 19:28 UTC
Better link for chroniclers, since the incident is now buried pretty far down on the status page: https://www.cloudflarestatus.com/incidents/8gmgl950y3h7
Thanks! We've switched to that from https://www.cloudflarestatus.com/?t=1 at the top.
Your origin servers are protected now as no one can access them. Thanks for choosing CloudFlare's MITM "protection".
Do you remember when the Internet was redundant and resilient?
It seems 20% of the Internet is down every two weeks now.
70% of the internet is down
Can't even change my nameservers away from Cloudflare as Namecheap use Cloudflare!!
The nameservers themselves seem to be working fine if anyone is wondering.
I run my applications on OVH behind BunnyCDN and all is well.
Just checked INWX from here in Germany. I was able to log in and get to my DNS records. Just if you should be looking for an alternative after all this.
Yes I will be looking, thanks for the rec!
Oh seriously! Thats one I didn’t realize.
Namesilo as well.
I was shouting at network guy/colleague, how come challenges.cloudflare.com got blocked!! damn, I must apologise to him.
Probably better not to shout in the first place.
It was friendly fire, nothing serious. haha
9 replies →
Even if he blocked it by accident, that is not a reason to shout.
Shouting will not prevent errors, and you are only creating a hostile work environment where not acting is better than the risk of making a mistake and triggering an aggressive response from your part.
It wasn't aggressive exchange, but will definitely consider your comment.
There is nothing else to do since CF is down... so.
There is nothing wrong with shouting during a perceived outage. Shouting is just raising your voice to give a notion of urgency. Yelling is different.
How often have you heard "shout at me", or something like that?
OP, continue you to shout when its needed, just don't yell at people you work with ;)
2 replies →
Don't worry beer gonna fix everything
The danger of Internet centralization in Cloudflare
That's why I run my server on 7100 chips made for me by Sam Zeloof in his garage on a software stack hand coded by me, on copper I ran personally to everyone's house.
You are joking but working on making decentralization more viable would indeed be more healthy than throwing hands up and accepting Cloudflare as the only option.
There was an article on HN a few days back about how companies like this are influencing the overall freedom of the web (I missed the source) and their own way of doing things. Other examples of influence I see similarly are of Vercel, like with enterprise. Even a few days back, we saw AWS.
Wanted to check if it was DNS again but https://isitdns.com/ is also down…
We definitely need a version without CF.
Then the page will just be down for load reasons.
This was geniuinely funny, thanks for that.
> Cloudflare Global Network experiencing issues
> Investigating - Cloudflare is aware of, and investigating an issue which potentially impacts multiple customers. Further detail will be provided as more information becomes available.
Things are back up (a second time) for me.
Cloudflare have updated their status page now to reflect the problems now. It doesn’t sound like they are confident the problem is fully fixed yet.
Edit: and down again a third time!
it's back again!
What would the Internet's architecture have to look like for DDOS'ing to be a thing of the past, and therefore Cloudflare to not be needed?
I know there are solutions like IPFS out there for doing distributed/decentralised static content distribution, but that seems like only part of the problem. There are obviously more types of operation that occur via the network -- e.g. transactions with single remote pieces of equipment etc, which by their nature cannot be decentralised.
Anyone know of research out there into changing the way that packet-routing/switching works so that 'DDOS' just isn't a thing? Of course I appreciate there are a lot of things to get right in that!
It's impossible to stop DDoS attacks because of the first "D".
If a botnet gets access through 500k IP addresses belonging to home users around the world, there's no way you could have prepared yourself ahead of time.
The only real solution is to drastically increase regulation around security updates for consumer hardware.
Maybe that's the case, but it seems like this conclusion is based on the current architecture of the internet. Maybe there are ways of changing it that mean these issues are not a thing!
2 replies →
Do the IP addresses botnet members get logged? Could those IP addresses be automatically blocked by DNS until they fix their machine?
6 replies →
What would that look like? A network with built-in rate & connection limiting?
The closest thing I can think of is the Gemini protocol browser. It uses TOFU for authentication, which requires a human to initially validate every interaction.
Built it into the protocol that you must provide bandwidth in order to have your requests served. A bit like forcing people to seed torrents.
Works for static content and databases, but I don't think it works for applications where there is by necessity only one destination that can't be replicated (e.g. a door lock).
Something like a mega-transnational-parent ISP authority and give tech giants LaLiga kind of power.
https://www.cloudflarestatus.com/incidents/8gmgl950y3h7
It's knocked out Turnstile too, which means I can't even log in to my Cloudflare dash to bypass my site's proxying via Cloudflare.
Oh you aren't missing much, the dashboard doesn't load anyway.
Even if you could, the DNS entries aren't loading. And then the page 404's.
I got several emails from some uptime monitors I setup due to failing checks on my website and funnily enough I cannot log into any of them.
BetterStack, InStatus and HetrixTools seemingly all use Cloudflare on their dashboards, which means I can't login but I keep getting "your website/API is down" emails.
Update: I also can't login to UptimeRobot and Pulsetic. Now, I am getting seriously concerned about the sheer degree of centralization we have for CDNs/login turnstiles on Cloudflare.
More vibe code gets into production. AWS, Azure and Cloudflare all have major issues.
Coincidence? I think not.
Even Cloudflare Status is now down, oh boy :) https://postimg.cc/LJVKYmks
Even your postimg.cc link is down for me.. (at least their CSS is)
https://ibb.co/QF6X0pX9
amazing
catbox.moe is up
https://files.catbox.moe/9r3zgr.png
Postimg's CDN is down
ERROR [12:00:21 UTC]: CF_EDGE_ROUTING_FAILURE. Reason: Origin-Shield connectivity loss detected within multi-region fabric. BGP path withdrawal initiated for critical LCP clusters (LCP-LON, LCP-FRA). Status code 521/522 flood reported globally. Geo-location failover services degraded. DNS resolution timeout on 1.1.1.1/1.0.0.1. Traffic flow re-routing pending verification of internal control plane integrity.
Did you ask an LLM to try to guess the error message?
In the beginning I thought my IP fell on the wrong side of Cloudflare and thought I was being blocked from ~80% of the internet. I was starting to panic
What have you been looking at citizen?
How come HN is never down with all these outages?
HN is just one active and one standby server at M5 Computer Security running BSD.
There are things out there which are running from a bare metal host, without relying on someone else's computer (aka the cloud). HN is one of them.
They stopped using Cloudflare some time ago
https://news.ycombinator.com/item?id=18188832
Because HN doesn't use Cloudflare.
Also doesn't use AWS or Azure because it didn't go down with them either.
2 replies →
Does HN self-host too?
6 replies →
How? It's literally impossible to run a major website these days without Cloudflare.
/s
This is what you get for being lazy and choosing to making the internet more centralized.
2 replies →
HN is running on the server the rest of the cloud rents time from.
The outages are the Roomba.
FreeBSD on bare metal hooked up to a nice network.
Everyone laughs when AWS collapses, everyone is silent when Cloudflare collapses. Why? Because the place to laugh has collapsed.
One is every seven years... the other one is a ...monthly event?: https://hn.algolia.com/?https://hn.algolia.com/?dateRange=al...
Most of those aren’t outages, and both providers have big blips.
Globally meaningful outages of either are quite rare.
7 replies →
Because those who were mocking him couldn't speak, X also crashed.
> Everyone laughs when AWS collapses, everyone is silent when Cloudflare collapses
Everyone laughs when Azure collapses too
Peter Levels wisdom about why to host not on aws not looking so wise right now
There are places other than AWS to host.
Down detector broke.... :-D
Yeah, how ironic. The site that is designed to tell you if something else is down, is currently down.
2 replies →
Cloudflare down because of a DDOS is extremely funny.
There's no evidence to suggest it was a result of a DDoS attack
3 replies →
Everyone is silent when Cloudflare collapses. Same goes for Azure, but that is because noone uses it.
When Azure goes down: Oh well
When Cloudflare goes down: Oh no
This would be true in the past but now most people are not on Twitter.
Sadly, I can report that this has brought down 2 of the major Mastodon nodes in the United Kingdom.
Happily, the small ones that I also use are still going without anyone apparently even noticing. At least, the subject has yet to reach their local timelines at the time that I write this.
2 of the other major U.K. nodes are still up, too.
"Most people" were never on Twitter to begin with. However its number of monthly active users have only grown since 2020.
1 reply →
Who is silent?
Except on HN
HAHA!
Our servers are still down, though
HAHAHAHA
2 replies →
[dead]
OMG! today of all days!
Black Tuesday
It's so crazy and scary that Cloudflare is the single point of failure for the internet.
But this decision is not determined by CF. It's how the devs decided.
Trying to figure out if this observation was intended to frame it so that it's less|same|more scary. The effect is more, but it sounds like the intention was less.
The common pasture.
This NYTimes article makes it sound like the problem is fixed, but I'm not seeing any improvement yet.
https://www.nytimes.com/2025/11/18/business/cloudflare-down-...
Latest is:
Update - The team is continuing to focus on restoring service post-fix. We are mitigating several issues that remain post-deployment. Nov 18, 2025 - 15:40 UTC
Funny that I could not load Twitter to see if Cloudflare was down.
I rushed to Hacker News, but it was too early. Clicking on “new” did the job to find this post before making it to the Homepage:)
The web is still alive!
It was on Mastodon. That one is hardly going down.
Who wants to join me at the Winchester for a pint, and wait for this all to blow over?
Got some red on you...
Seems like ChatGPT and Claude are also affected. (CLI Codex still seems to work).
RIP to the engineers fixing this without any AI help.
They better not be using AI to fix this... especially if AI is what caused it! (looking at you, AWS)
For me right now, Claude.ai is down, but Claude Code (terminal, extension) seems to be up and happy. Suggests that API is probably up.
At some point we really need to think if this is the web we want, one/two major actors are down and everything goes with it
Not downplaying the immense work of infra / engineering at this scale but my neighborhood local grocery market shouldn’t be down
Decentralisation is at some point directly opposed to operational efficiency, when the sun is shining.
A shark is an extremely energy efficient creature, but it is relatively stupid.
1 reply →
And centralization is ineffective long term
It's hard not to use Cloudflare at least for me: good products, "free" for small projects, and if Cloudflare is down no one will blame you since the internet is down.
> if Cloudflare is down no one will blame you since the internet is down.
That is true. it is also the problem. It means the biggest providers do not even need to bother to be reliable because everyone will use them anyway.
3 replies →
"Accountability Sinks"
https://aworkinglibrary.com/writing/accountability-sinks
> if Cloudflare is down no one will blame you since the internet is down.
But this is not really the case. When Azure/AWS were down, same as this one with Cloudflare: significant amount of web was down but most of it was not. It just makes more obvious which provider you use.
Think about this rationally. If Cloudflare doesn't fix it within reasonable time, you can just point to different name servers and have your problem fixed in minutes.
So why be on Cloudflare to start with? Well, if you have a more reliable way then there's no reason. If you have a less reliable way, then you're on average better off with Cloudflare.
Well I can't change my NS since it's on Cloudflare too but besides that my personal opinion was not about this outage in particular but more the default approach of some websites that don't need all this tech (yes I really was out of groceries)
4 replies →
There’s certainly a business case for “which nines” after the talk of n nines. You ideally want to be available when your competitor, for instance, is not.
Why everyone needs to be behind Cloudflare. I don't think DDOSing sites out of whim is so rampant that everyone needs the virtual umbrella.
It's the web-scrapers. I run a tiny little mom and pop website, and the bots were consistently using up all of my servers' resources. Cloudfare more or less instantly resolved it.
3 replies →
I’ve been DDoS’d countless times running a small scale, uncontroversial SaaS. Without them I would’ve had countless downtime periods with really no other way to mitigate.
There's plenty of DDoS if you're dealing with people petty enough.
The VPS I use will nuke your instance if you run a game server. Not due to resource usage, but because it attracts DDoS like nothing else. Ban a teen for being an asshole and expect your service to be down for a week. And there isn't really Cloudflare for independent game servers. There's Steam Networking but it requires the developer to support it and of course Steam.
Valve's GDC talk about DDoS mitigation for games: https://youtu.be/2CQ1sxPppV4
1 reply →
It actually is.
I run a small video game forum with posts going back to 2008. We got absolutely smashed by bots scraping for training data for LLMs.
So I put it behind Cloudflare and now it's down. Ho hum.
8 replies →
I was arrested by Interpol in 2018 because of warrants issued by the NCA, DOJ, FBI, J-CAT, and several other agencies, all due to my involvement in running a DDoS-for-hire website. Honestly, anyone can bypass Cloudflare, and anyone that want to take your website down - will take it down. It's just that luckily for all of us most of the DDoS-4-hire websites are down nowadays but there are still many botnets out there that will get past basically any protection and you can get access to them for basically $5.
9 replies →
Good chance the reason DDOSing isn't so big anymore is because everyone is on Cloudflare.
1 reply →
There are plenty of alternatives to protect against DDoSing, people like convenience though. “Nobody gets fired for choosing Microsoft/Cloudflare”. We have a culture problem
It's not super common, but common enough that I don't want to deal with it.
The other part is just how convenient it is with CF. Easy to configure, plenty of power and cheap compared to the other big ones. If they made their dashboard and permission-system better (no easy way to tell what a token can do last I checked), I'd be even more of a fan.
If Germany's Telekom was forced to peer on DE-CIX, I'd always use CF. Since they aren't and CF doesn't pay for peering, it's a hard choice for Germany but an easy one everywhere else.
DDOSing is absolutely so rampant that you need to be behind something.
11 replies →
Cloudflare DDOS protection is super essential (especially for smaller businesses)
6 replies →
Honestly it kinda is. Ai bots scrape everything now, social media means you can go viral suddenly, or you make a post that angers someone and they launch an attack just because. I default to cloudflare, because like an umbrella I might just be carrying it around most of the time, but in the case of a sudden downpoor it's better than getting wet.
Setting up a replica and then pointing your api requests at it when cloudflare request fails is trivial. This way if you have a SPA and as long as your site/app is open the users won't notice.
The issue is DNS since DNS propagation takes time. Does anyone have any ideas here?
> Setting up a replica and then pointing your api requests at it when cloudflare request fails is trivial.
Only if you're doing very basic proxy stuff. If you stack multiple features and maybe even start using workers, there may be no 1:1 alternatives to switch to. And definitely not trivially.
1 reply →
Two domains for your api perhaps, a full blown SPA could try one and then the other.
1 reply →
Owning your IP space and using Anycast.
1 reply →
> At some point we really need to think if this is the web we want,
You think we have a say in this?
The HN crowd in particular absolutely has a say in this, given the amount of engineering leads, managers, and even just regular programmers/admins/etc that frequent here - all of whom contribute to making these decisions.
You have the power to not host your own infrastructure on aws and behind cloudflare, or in the case of an employer you have the power to fight against the voices arguing for the unsustainable status quo.
3 replies →
It's not the web we want, but it's the web corporations want. And everybody else doesn't give a damn.
Believe me it’s what people want. The alternative is far worse.
We? I am not using it. I never used it and I will not use it. People should learn how to work with firewall, setup a simple ModSecurity WAF and stop using this bullshit. Almost everything goes through cloudflare and cloudflare also does TLS fronting for websites so basically cloudflare is MITM spying proxy but no one seem to care. :/
BLOCKCHAINS! I mean, some sort of P2P hosting and/or node discovery would be nice.
Cloudflare seems to have degrated performance. Half the requests for my site throw cloudflare 500x errors, the other half work fine.
However the https://www.cloudflarestatus.com/ does not really mention anything relevant. What's the point of having a status page if it lies ?
Update Ah I just checked the status and now I get a big red warning (however the problem existed for like 15 minutes before 11:48 UTC):
> Investigating - Cloudflare is aware of, and investigating an issue which potentially impacts multiple customers. Further detail will be provided as more information becomes available. Nov 18, 2025 - 11:48 UTC
> What's the point of having a status page if it lies ?
Status pages are basically marketing crap right now. The same thing happened with Azure where it took at least 45 minutes to show any change. They can't be trusted.
>However the https://www.cloudflarestatus.com/ does not really mention anything relevant. What's the point of having a status page if it lies ?
What is the lie ?
> Cloudflare Global Network experiencing issues
cloudflare has a specific service names "Network" and it's having issues..
Please read my comment again including the update:
For 15 minute cloudflare wasn't working and the status page did not mentioned anything. Yes, right now the status page mentions the serious network problem but for some time our pages were not working and we didn't know what was happening.
So for ~ 15 minutes the status page lied. The whole point of a status page is to not lie, i.e to be updated automatically when there are problem and not by a person that needs to get clearance on what and how to write.
I didn’t see anyone comment this directly, but something these recent outages made me wonder, having spent a good chunk of my career in 24/7 tech support, is that I can’t even fathom the amount of people who have been:
- restarting their routers and computers instead of taking their morning shower, getting their morning coffee, taking their medication on time because they’re freaking out, etc. - calling ISPs in a furious mood not knowing it’s a service in the stack and not the provider’s fault (maybe) - being late for work in general - getting into arguments with friends and family and coworkers about politics and economics - being interrupted making their jerk chicken
This sentence is slowly getting boring after all those recent outages: My web app hosted on Hetzner and BunnyCDN still works.
That shows, the distributed nature of the internet is still there. It is a problem though, if everything is funneled through one provider.
I've been migrating all my personal stuff to Cloudflare. They have good products and good pricing.
At the same time I'm worried about how the internet is becoming even more centralized, which goes against how it was originally designed.
Same here. A lot of my sites are now down.
[flagged]
No, just competing priorities.
Cloud in general was a mistake. We took a system explicitly designed for decentralization and resilience and centralized it and created a few neat points of failure to take the whole damn thing down.
Cloudflare provides some nice services that have nothing to do with cloud or not. You can self-host private tunnels, application firewalls, traffic filtering, etc, or you can focus on building your application and managing your servers.
I am a self-host enthousiast. So I use Hetzner, Kamal and other tools for self-managing our servers, but we still have Cloudflare in front of them because we didn't want to handle the parts I mentioned (yet, we might sometime).
Calling it a mistake is a very narrow look at it. Just because it goes down every now and then, it isn't a mistake. Going for cloud or not has its trade-offs and I agree that paying 200 dollars a month for a 1GB Heroku Redis instance is complete madness when you can get a 4GB VPS on Hetzner for 3,8 a month. Then again, some people are willing to make that trade-off for not having to manage the servers.
Cloud servers have taught me so much about working with servers because they are so easy and cheap to spin up, experiment with and then get rid of again. If I had had to buy racks and host them each time I wanted to try something, I would've never done it.
Sure, it's a great fair-weather technology, makes some things cheap and easy.
But in the face of adversity, it's a huge liability. Imagine Chinese Hackers taking down AWS, Cloudflare, Azure and GCP simultaneously in some future conflict. Imagine what that would do to the West.
I don't believe in Fukuyamas End of History. History is still happening, and the choices we make will determine how it plays out.
Thanks, I was too lazy to write this, and noticed this comment multiple times now. It's good to be sceptical at times, but in this case it simply misses the mark.
Threat actors (DDoS) and AI scraping already threw a wrench in decentralization. It's become quite difficult to host anything even marginally popular without robust infrastructure that can eat a lot of traffic
It took me a while to understand it, but the beauty of it is that when it fails, lot of things fail.
Almost no one gets mad if your site and half the internet were down.
Sure, but that is also a giant weakness. Say in a future conflict with Russia or China, or hell, even North Korea.
They'd only have to take down a few services to completely cripple the West - the exact case ARPANET was designed to prevent.
1 reply →
This is crazy. The internet has so much direct and transitive dependency on Cloudflare today. Pretty much the #1 dev slacking excuse today is no longer code compiling but cloudflare is down.
ChatGPT is Down. What will LinkedIn posters ever do?
They will embark on a journey of synergy, and be agile.
Incoming Posts on LinkedIn:
what ChatGPT and Claude being down taught me about b2b SaaS
I can now imagine a scenario where everyone has become so dependent on the AI tool that it going down could turn into an unanticipated black start event for the entire internet.
https://tenor.com/view/obiwan-kenobi-disturbance-in-the-forc...
I sense a great disturbance in the force... As if millions of cringefluencers suddenly cried out in terror cause they had to come up with an original thought.
And Anthropic Claude
For me right now, Claude.ai is down, but Claude Code (terminal, extension) seems to be up and happy. Suggests that API is probably up.
It's insane to me that big internet uptime monitoring tools like Pingdom and Downdetector both seem to rely on Cloudflare, as both of those are currently unavailable as well.
down detector works in serbia
[dead]
We've traded DDoS for centralized DoS.
centralized incompetency causes distributed denial of service
It's been 15 minutes of it going up and down, still nothing on their status page...
They've just added it less than a minute ago. I expected a little more responsiveness from Cloudflare...
Everything else aside, 20 minutes to get their status page updated seems pretty damn fast.
Me too. What good is a status page that's not automated?
3 replies →
Well, we've seen it now, they'll have to update it eventually!
The irony is that if you follow the relevant link [1]in the error page , you get this
> If the problem isn’t resolved in the next few minutes, it’s most likely an issue with the web server you were trying to reach.
[1] https://www.cloudflare.com/5xx-error-landing/?utm_source=err...
Related to Azure DDoS?
https://news.ycombinator.com/item?id=45955900
whats your chain of thought here ? a company that has nothing to do with azure is down because azure got ddosed 2 weeks ago ?
Maybe that any actor sophisticated enough to take down Azure might also target Cloudflare?
1 reply →
Maybe related to their scheduled maintenance? https://www.cloudflarestatus.com/
I thought that as we're seeing issues with LON, but their Manchester POP is also down and that didn't have any maintenance this morning.
Ironically I can't even read the link in the article because cloudflare is down.
Linked Microsoft blog article mentions that DDoS was in October.
The main bike rental Velib in Paris has the app not working, but the bikes can be taken with NFC. However, my station, which is always full at this time, is now empty, with only 2 bad bikes. It maybe related. Yet, push notifications are working.
I'm going to take the metro now and thinking how long do we have until the entire transit network goes down because of a similar incident.
Feels like it's been a rough year for huge infra outages man :(.
Is it DNS? I went to check the isitdns.com but got a cloudflare error
Later today or tomorrow there's going to be a post on HN pointing to Cloudflare's RCA and multitudes here are going to praise CF for their transparency. Let's not forget that CF sucks and took half the internet down for four hours. Transparency or no, this should not be happening.
Alot of things shouldnt be happening. Fact is that no one forced half the internet to make CF their point of failure. The internet should ask themselves if that was the right call
Speaking of 5 9s, how would you achieve 5 9s for a basic CRUD app that doesn't need to scale, but still be globally accessible? No auth, micro services, email or 3rd party services. Just a classic backend connected to a db (any db tech, hosted wherever), that serves up some html.
It depends on the infrastructure you're running on. There was a post yesterday going fairly into depth how you do such calculations https://authress.io/knowledge-base/articles/2025/11/01/how-w...
You probably cannot achieve this with a single node, so you'll at least need to replicate it a few times to combat the normal 2-3 9s you get from a single node. But then you've got load balancers and dns, which can also serve as single point of failure, as seen with cloudflare.
Depending on the database type and choice, it varies. If you've got a single node of postgres, you can likely never achieve more than 2-3 9s (aws guarantees 3 9s for a multi-az RDS). But if you do multi-master cockroach etc, you can maybe achieve 5 9s just on the database layer, or using spanner. But you'll basically need to have 5 9s which means quite a bit of redundancy in all the layers going to and from your app and data. The database and DNS being the most difficult.
Reliable DNS provider with 5 9s of uptime guarantees -> multi-master load balancer each with 3 9s, -> each load balancer serving 3 or more apps each with 3 9s of availability, going to a database(s) with 5 9s.
This page from google shows their uptime guarantees for big tables, 3 9s for a single region with a cluster. 4 9s for multi cluster and 5 9s for multi region
https://docs.cloud.google.com/architecture/infra-reliability...
In general it doesn't matter really what you're running, it is all about redundancy. Whether that is instances, cloud vendor, region, zone etc.
Part of the up-time solution is keeping as much of your app and infrastructure within your control, rather than being at the behest of mega-providers as we've witnessed in the past month: Cloudflare, and AWS.
Probably:
- a couple of tower servers, running Linux or FreeBSD, backed up by a UPS and an auto-run generator with 24 hours worth of diesel (depending on where you are, and the local areas propensity for natural disasters - maybe 72 hours),
- Caddy for a reverse proxy, Apache for the web server, PostgreSQL for the database;
- behind a router with sensible security settings, that also can load-balance between the two servers (for availability rather than scaling);
- on static WAN IPs,
- with dual redundant (different ISPs/network provider) WAN connections,
- a regular and strictly followed patch and hardware maintenance cycle,
- located in an area resistant to wildfire, civil unrest, and riverine or coastal flooding.
I'd say that'd get you close to five 9s (no more than ~5 minutes downtime per year), though I'd pretty much guarantee five 9s (maybe even six 9s - no more than 32 seconds downtime per year) if the two machines were physically separated from each other by a few hundred kilometres, each with their own supporting infrastructure above, sans the load balancing (see below), through two separate network routes.
Load balancing would become human-driven in this 'physically separate' example (cheaper, less complex): if your-site-1.com fails, simply re-point your browser to your-site-2.com which routes to the other redundant server on a different network.
The hard part now will be picking network providers that don't use the same pipes/cables, i.e. they both use Cloudflare, or AWS...
Keep the WAN IPs written down in case DNS fails.
PostgreSQL can do master-master replication, but it's a pain to set up I understand.
what if you could create a super virtual server of sorts. imagine a new cloud provider like vercel but called something else. what this provider does is when you create a server on their service, they create 3 services, one on aws, one on gcp and one on azure. behind the scenes they are 3 separate servers but to the end user they are a single server. the end user gets to control how many cloud providers are involved. when aws goes down, no worries, it switches to the part with gcp on
Stock VPS somewhere like OVH or Hetzner, with a replica in a different provider?
Doesn't Hetzner carry the risk if getting kicked off on a whim? Only time I hear about them is when someone gets kicked out.
Seriously, bookmarking this site and checking it first next time instead of disabling all my ad blockers.
I've been considering Cloudflare for caching, DDoS protection and WAF, but I don't like furthering the centralization of the Web. And my host (Vultr) has had fantastic uptime over the 10 years I've been on them.
How are others doing this? How is Hacker News hosted/protected?
I got an email saying that my OpenAI auto-renewal failed, my credits have run out. I go to OpenAI to reauthorize the card, and I can't login because OpenAI uses Cloudflare for "verifying you are a human" that goes in infinite loop. Great.
oh no, the vibes won't code themselves /s
Or even worse, people with apps in production with credits running out
Phew, my latest 3h30 workshop about Obsidian was saved. I recorded it this morning, not knowing about the Cloudflare issue (probably started while I was busy). I'm using Circle.so and they're down (my community site is now inaccessible). Luckily, they probably use AWS S3 or similar to host their files, so that part is still up and running.
Meanwhile all my sites are down. I'll just wait this one out, it's not the end of the world for me.
My GitHub actions are also down for one of my project because some third-party deps go through Cloudflare (Vulkan SDK). Just yesterday I was thinking to myself: "I don't like this dependency on that URL...". Now I like it even less
> A fix has been implemented and we believe the incident is now resolved. We are continuing to monitor for errors to ensure all services are back to normal. Posted 3 minutes ago. Nov 18, 2025 - 14:42 UTC
Seems like they think they've fixed it fully this time!
Close! They just updated their states and it's back to working on a fix
Update - Some customers may be still experiencing issues logging into or using the Cloudflare dashboard. We are working on a fix to resolve this, and continuing to monitor for any further issues. Nov 18, 2025 - 14:57 UTC
I miss the old internet where 1 company having an outage didn't take down most of it.
I'm thinking about all those quips from a few decades back, along the lines of: "The Internet is resilient, it's distributed and it routes around damage" etc.
In many ways it's still true, but it doesn't feel like a given anymore.
Recently my multiple VPN server nodes just randomly cannot connect to cloudflare CDN IPs, from different provider VPS, while the Host Linux network does not have the issue; vpp share the same address with Linux and use tc stateless NAT to do the trick.
I finally work around this by change the tcp options sent by vpp tcp stack.
But the whole thing made me worry there must be something deployed which cause this issue.
But I do not think that related with this network issue, it just reminds me the above, I feel there are frequently new articles about cloudflare networking, maybe new method or new deployment sort of related high probability of issues
For anyone reading this who desperately needs their website up, you can try this: If you manage to get to your Cloudflare DNS settings and disable the "Proxy status (Proxied)" feature (the orange cloud), it should start working again.
Be aware that this change has a few immediate implications:
- SSL/TLS: You will likely lose your Cloudflare-provided SSL certificate. Your site will only work if your origin server has its own valid certificate.
- Security & Performance: You will lose the performance benefits (caching, minification, global edge network) and security protections (DDoS mitigation, WAF) that Cloudflare provides.
This will also reveal your backend internal IP addresses. Anyone can find permanent logs of public IP addresses used by even obscure domain names, so potential adversaries don't necessarily have to be paying attention at the exact right time to find it.
Unfortunately, this will also expose your IP address, which may leave you vulnerable even when the WAF and DDoS protections come back up (unless you take the time to only listen for Cloudflare IP address ranges, which could still take a beefy server if you're having to filter large amounts of traffic).
also the API was working fine while the dash was down.
if you don't have the keys make sure to grab them for the next one.
> a routine configuration change we made
Every.Single.Time
[1] https://x.com/dok2001/status/1990791419653484646
Cloudflare's dashboard is currently down as well.
My domain is registered with cloudflare so I'm 100% helpless to get things back online.
I can't edit DNS records to bypass cloudflare and I can't change nameservers either.
We are all impacted...
I think you should give me a credit for all the income I lost due to this outage. Who authorized a change to the core infrastructure during the period of the year when your customers make the most income? Seriously, this is a management failure at the highest levels of decision-making. We don't make any changes to our server infrastructure/stack during the busiest time of the year, and neither should you. If there were an alternative to Cloudflare, I'd leave your service and move my systems elsewhere.
Looking forward to seeing their RCA. I'm guessing it's going to be glossy in terms of actual customer impact. "We didn't go offline, we just had 100% errors. For 60 minutes."
Didn't realize Twitter uses cloudflare. It seems to be down as well
I believe it used to be AWS and they switched.
everything is down except HN :D
The just-one-big-server-in-someones-basement stack remains undefeated.
Except it isn't that big?
I don’t know; HN historically has had way worse uptime than Cloudflare.
1 reply →
It's a little surprising how little it affect me. I believe it's around 20% of websites that use CloudFlare in some form or another.
Pagerduty is up
Bunnycdn lives
Digg.com is working perfectly hahahaha
Those football playoffs are really getting out of hand…
Ref: https://news.ycombinator.com/item?id=43157000
Good one!
My theory is that people's skills are getting worse. Attention spans are diminishing, memory is shrinking. People age and retire, new less skilled generations are replacing them. There are studies about declining IQ in the last decades. Probably mobile phones and social media are to blame.
We see the signs with Amazon and Cloudflare going down, Windows Update breaking stuff. But the worse is yet to come, and I am thinking about airport traffic control, nuclear power plants, surgeons...
> There are studies about declining IQ in the last decades. Probably mobile phones and social media are to blame.
It is much more nuanced than that.
The long-term rise (Flynn Effect) of IQs in the 20th century is widely believed to be driven by environmental factors more than genetics.
Plateau / decline is context-dependent: The reversal or slowdown isn’t universal, like you suggest. It seems more pronounced in certain countries or cohorts.
Cognitive abilities are diversifying: As people specialize more (education, careers, lifestyles), the structure of intelligence (how different cognitive skills relate) might be changing.
DigitalOcean + Gandi means nothing I run is down. Amazing. We depend far too greatly on centralised services where we deem the value of reputation and convenience exceeds the potential downsides and then the world pays for it. I think we have to feel a lot more of this pain before regulation kicks in to change things because the reality is people don't change. The only thing you can personally do is run a lot of your own stuff for things you can.
Individually, for you, what's the difference?
You use a service provider, if that service provider is down, your site is down. Does it matter to you that others are also down in that instance?
Might even be better to go down at the same time as everyone else, because customers might be more lenient on you.
DigitalOcean is indeed having issues.
> Application error: a client-side exception has occurred while loading www.digitalocean.com (see the browser console for more information).
Yellow flags on status.digitalocean.com *
Still nothing is down...
The sites I host on Cloudflare are all down. Also, even ChatGPT was down for a while, showing the error: "Please unblock challenges.cloudflare.com to proceed."
This is still the case for me
10.30pm here in Australia...
and my alarms are going off my and support line is ringing...
I cant even login to my CF dashboard to disable the CDN!
Edit: It's back. Hopefully it will stay up!
Edit 2: 1 Hour Later.
Narrator: It didn't stay up :/
I happened to be working with Claude when this occurred. Having no idea what exactly what the cause was, I jumped over to GPT and observed the same. I did a dig challenges.cloudflare.com and by the time I'd figured out kind of what was happening, it seemed to have... resolved itself
I must say I'm astonished, as naive as it may be, to see the number of separate platforms affected by this. And it has been a bit of a learning experience too.
Oh, look! Cloudflare is down. Let's check down detector to make sure it's not just me > Downdetector is using Cloudflare captcha. Yep, it's down.
I didn't think about the Cloudflare API, but we'll make sure to do it next time. Hopefully, it won't happen again. I want Cloudflare to delegate DNS control to an external provider so it's easy to disable/enable the CF proxy in case something like this happens.
Yesterday I decided to finally write my makefiles to "mirror" (make available offline) the docs of the libraries I'm using. doc2dash for sphinx-enabled projects, and then using dash / zeal.
Then I was like... "when did I last time fly for 10+ hours and wanted to do programming, etc, so that I need offline docs?" So I gave up.
Today I can't browse the libs' docs quickly, so I'm resuming the work on my local mirroring :-)
This reminds me that I really like self-hosting. While it is true that many of things do not work, all my services do work. It has some tradeoffs of course.
There is an election in Denmark today, I wonder if this will affect that. The governments website is not accessible at the moment because it uses Cloudflare.
My tinfoil hat has me wondering if it's just coincidence.
What do we actually lose going from cloud back to ground?
The mass centralization is a massive attack vector for organized attempts to disrupt business in the west.
But we’re not doing anything about it because we’ve made a mountain at of a molehill. Was it that hard to manage everything locally?
I get that there’s plenty of security implications going that route, but it would be much harder to bring down t large portions of online business with a single attack.
> What do we actually lose going from cloud back to ground?
A lot of money related to stuff you currently don't have to worry about.
I remember how shit worked before AWS. People don't remember how costly and time consuming this stuff used to be. We had close to 50 people in our local ops team back in the day when I was working with Nokia 13 years ago. They had to deal with data center outages, expensive storage solutions failing, network links between data centers, offices, firewalls, self hosted Jira running out of memory, and a lot of other crap that I don't spend a lot of time about worrying with a cloud based setup. Just a short list of stuff that repeatedly was an issue. Nice when it worked. But nowhere near five nines of uptime.
That ops team alone cost probably a few million per year in salaries alone. I knew some people in that team. Good solid people but it always seemed like a thankless and stressful job to me. Basically constant firefighting while getting people barking at you to just get stuff working. Later a lot of that stuff moved into AWS and things became a lot easier and the need for that team largely went away. The first few teams doing that caused a bit of controversy internally until management realized that those teams were saving money. Then that quickly turned around. And it wasn't like AWS was cheap. I worked in one of those teams. That entire ops team was replaced by 2-3 clued in devops people that were able to move a lot faster. Subsequent layoff rounds in Nokia hit internal IT and ops teams hard early on in the years leading up to the demise of the phone business.
Yeah, people have such short memories for this stuff. When we ran our own servers a couple of jobs ago, we had a rota of people who'd be on call for events like failing disks. I don't want to ever do that again.
In general, I'm much happier with the current status of "it all works" or "it's ALL broken and its someone else's job to fix it as fast as possible"!
Not saying its perfect but neither was on-prem/colocation
Funny how I couldn't even check on Downdetector.com - because it takes me to a Cloudfare-run captcha, which is now stuck on loading.
The internet is officially down.
Strange thing is this is in multiple CD regions all using bot & WAF are down, just got a colueuge to check our site and both London & Singapour cloudflare servers are out... And I cant even login to the cloudflare dash to re-route critical traffic . Likely this is accidental, but one day there will be something malicous that will have big impacts with how centralised the internet now is.
From the Cloudflare status website: "Scheduled maintenance is currently in progress." Maybe something went wrong while doing maintenance?
They consistently have scheduled maintenance.
Some of my websites are down. Says it's the cloudflare network, when I click it it says generic things about my server likely being the issue.
Yes, most of things are down, recapthca, dns routing, proxy, etc.
all of my websites are down
I thought I would be clever by switching domain endpoints from proxied to dns but Cloudflare admin page is also not working correctly ;)
edit: it's up!
edit: it's down!
>Cloudflare Global Network experiencing issues
>Cloudflare is aware of, and investigating an issue which potentially impacts multiple customers. Further detail will be provided as more information becomes available.
>Posted 4 minutes ago
https://www.cloudflarestatus.com/incidents/8gmgl950y3h7
Things are back up (a second time) for me. It doesn’t sound like they are confident the problem is fully fixed yet though
Edit: and down again a third time!
NPM also seems to be down due to this! https://status.npmjs.org
It's the first time it's been down that I've seen a sensible error message.
But I was supposed to be commuting, so I guess I'll do that.
I had two completely unrelated tabs open (https://twitter.com and https://onsensensei.com) both showing the same error. Opened another website, same error. Kinda funny to see how much of entire web is ran on CloudFlare nowadays.
Love how everyone plays with redundancy - multiple hosts, balance loader, etc, and yet half of the web relies on single point of failure being CF
Indeed. And it feels really good knowing that our stuff isn't in that half.
Is there any way to remove every SPOF?
Currently I have multi-region loadbalanced servers. DNS and WAF (and the load balancer) on Cloudflare.
Moving DNS elsewhere is step 1 so I'm not locked out - but then I can't use Cloudflare full stop (without enterprise pricing).
Multi-provider DNS and WAF - okay I could see how that works.
But what about the global load balancer, surely that has to remain a single point of failure?
No? The point of cloudflare is that they remove the spof for you, but I guess we can say they didn't do it quite perfectly
Why do people use the reverse proxy functionality of Cloudflare? I've worked at small to medium sized businesses that never had any of this while running public facing websites and they were/are just fine.
Same goes for my personal projects: I've never been worried about being targeted by a botnet so much that I introduce a single point of failure like this.
Any project that starts gaining any bit of traction get's hammered with bots (the ones that try every single /wp url even tough you don't even use Wordpress), frequent DDoS attacks, and so on.
I consider my server's real IP (or load balancer IP) as a secret for that reason, and Cloudflare helps exactly with that.
Everything goes through Cloudflare, where we have rate limiters, Web firewall, challenges for China / Russian inbound requests (we are very local and have zero customers outside our country), and so on.
people think that running nodejs servers are a good idea, and those fall over if there's ever so much as a stiff breeze, so they put cloudflare in front and call it a day.
It gives really good caching functionality so you can have large amounts of traffic and your site can easily handle it. Plus they don't charge for egress traffic.
I’m surprised your projects aren’t plagued by massive waves of scraping traffic like the rest of us. Count yourself lucky, not superior.
What exactly are you serving that bot traffic affects your quality of service?
I've seen an RPi serve a few dozen QPS of dynamic content without issue... The only service I've had actually get successfully taken down by benign bots is a Gitea-style git forges (which was 'fixed' by deploying Anubis in front of it).
It's chic. Young bois or adult pepl with boi like mentality.
What, they have Cloudflare and we don't? We also must have cloudflare. Don't ask why.
Now that you have it, you are at least level 15 and not a peasant.
Same applies to every braindead framework on the web. The gadget mind of the bois is the cause for all this.
I was joking that after AWS, Azure that the Cloudflare would be next one...
So which large service we have left that could take chunk of internet out?
Gcloud
Can't wait to read their post-mortem report
Our national transit agency is apparently a customer.
The departure tables are borked, showing incorrect data, the route map stopped updating, the website and route planner are down, and the API returns garbage. Despite everything, the management will be pleased to know the ads kept on running offline.
Why would you put a WAP between devices you control and your own infra, God knows.
> Why would you put a WAP between devices you control and your own infra
Checkbox security says a WAP is required and no CISO will put their neck on the line to approve the exemption.
I think everyone is in the same boat with thinking they took something offline :^)
Concerning though how much the web relies on one (great) service.
I had just deployed. Started reverting commits like crazy.
I got an invoice from them right before the outage. Hopefully when they restore everything, they'll have forgotten about it!
Funny how I trusted Cloudflare first and started looking at restarting my servers, only to realize it's not me this time :)
The error even kinda says that. Still assumed it's me
ERROR [11:57:30 UTC]: EC2 Launch Failure. Reason: [Security Breach Remediation] Control Plane Metadata Service (IMDS) temporarily offline. System state reports: Dependency integrity check failed (Exit Code 0x80070002). Cannot retrieve authorized kernel image or block device mapping. Termination signal initiated for compromised worker nodes.
The irony of being in the middle of reading how Basecamp got off the cloud and the external link being down with a CF error :D
Is it me, or do the outages of single points of failure for large swaths of the internet tend to cluster within weeks/days of one another?
Anyone know why? Could be totally bias because one news story propels the next, so when they happen in clusters, you just hear about them more than when they don't.
The non profit I volunteer at is unreachable. It gives a cloudflare error page which is sort of helpful. It tells me the the site is ok but cloudflare has an 500.
It’s been great, but I always wonder when a company starts doing more than it’s initially calling. There have been a ton of large attacks, tons of bot scrappers so it’s the Wild West.
yes they're spreading themselves very thin with lots of new releases/products - but they will lose a lot of customers if their reliability comes into question
I was trying to look up banana-based jokes (https://upjoke.com/banana-jokes) and discovered that London Cloudflare seems to be down.
Then, I tried various down detecting sites and they didn't seem to work either - presumably due to Cloudflare.
it's a slippery slope alright
It's back up, sites are working. Still wonder how long it's going to last. IF there's another blackout.
sites aint working in india
yes bro
So they broke the internet. Nice! Never seen so many sites not working. Never seen so many desktop app suddenly stop working. I don't want to be the person responsible for this. And this again has thought me it's better to no rely on external services. Even though they seem to big to fail.
"The issue has been identified and a fix is being implemented." According to CF a minute ago: https://www.cloudflarestatus.com/incidents/8gmgl950y3h7
Down, but the linked status page shows mostly operational, except for "Support Portal Availability Issues" and planned maintenance. Since it was linked, I'm curious if others see differently.
edit: It now says "Cloudflare Global Network experiencing issues" but it took a while.
Luckily for everyone including Guilhermo he can't dunk on the situation since x.com is down as well.
Using Cloudflare is a tradeoff between facing DDoS and other attacks, and the downtime of Cloudflare.
It would appear if you use a VPN in Europe you can still access Cloudflare sites, I have just tried, for me the Netherlands, Germany, and France work, but the UK and USA don't.
EDIT: It would appear it is still unreliable in these countries, it just stopped working in France for me.
Cloudflare Dashboard/Clicky clicky UI is down. I really appreciate that their API is still working. Small change in our Terraform configuration and now I can go lunch in peace knowing our clients at skeeled can keep working if wanted:
resource "cloudflare_dns_record"
- proxied = true
+ proxied = false
No logging in to Cloudflare Dash, no passing Turnstile (their CAPTCHA Replacement Solution) on third-party websites not proxied by Cloudflare, the rest that are proxied throwing 500 Internal server error saying it's Cloudflare's fault…
Feels like half the internet is down.
Our doctor's office can't make appointments because their "system is down."
I am glad my personal site is not affected, what would I do without all those incoming traffic.
[dead]
It's down here in Sydney as well. The status page hasn't been updated to reflect that
20% of websites worldwide are down.
source? would love to see
I host everything on Linode (have for over a decade) and am never caught up in these outages.
Linode has been rock solid for me. I wanted to back this comment with uptime numbers, unfortunately the service I use for that, Uptime Robot, is down because of Cloudflare...
Investigating - Cloudflare is aware of, and investigating an issue which potentially impacts multiple customers. Further detail will be provided as more information becomes available. Nov 18, 2025 - 11:48 UTC
Yeah, those multiple customers is like 70% of the internet.
I would love to see a competition for the most banal thing that went wrong as a result of this. For example, I’m pretty sure the reason my IKEA locker wouldn’t latch shut was because the OS had hung while talking to a Cloudflare backend.
Unfortunately it seems like it, our service has lost a portion of our Cloudflare connectivity. We use their tunnels functionality.
Additionally, it looks like Pingdom/Solarwinds authentication is affected too - not a great look for a service in that category.
Cloudflare runs a high demand service, and the centralisation does deserve scrutiny. I think a good middle ground I’ll adopt is self hosting critical services and then when they have an outage redirect traffic to a Cloudflare outage banner.
Meanwhile my Wordpress blog on DigitalOcean is up. And so is DigitalOcean.
My ISP is routing public internet traffic to my IPs these days. What keeps me from running my blog from home? Fear of exposing a TCP port, that's what. What do we do about that?
> What keeps me from running my blog from home?
Depending on the contract it might not be allowed to run public network services from your home network.
I had a friend doing that and once his site got popular the ISP called (or sent a letter? don't remember anymore) with "take this 10x more expensive corporate contract or we will block all this traffic".
In general why the ISPs don't want you to do that (in addition to way more expensive corporate rates) is the risk of someone DDoS that site which could cause issues to large parts of their domestic customers (and depending on the country be liable to compensate those customers for not providing a service they paid for)
> Our Engineering team is actively investigating an issue impacting multiple DigitalOcean services caused by an upstream provider incident. This disruption affects a subset of Gen AI tools, the App Platform, Load Balancer, Spaces and provisioning or management actions for new clusters. Existing clusters are not affected. Users may experience degraded performance or intermittent failures within these services.
> We acknowledge the inconvenience this may cause and are working diligently to restore normal operations. Signs of recovery are starting to appear, with most requests beginning to succeed. We will continue to monitor the situation closely and provide timely updates as more information becomes available. Thank you for your patience as we work towards full service restoration.
It's not down for you, but for others.
Yeah, DigitalOcean and Dreamhost are both up. I actually self-host on 2Gig fibre service, and all my stuff is up, except I park everything behind Cloudflare since there is no way I could handle a DDoS attack.
one way to mitigate DDoS is to enforce source IP checks on the way OUT of a datacenter (egress).
sure there are botnets, infected devices, etc that would conform to this but where does the sheer power of a big ddos attack come from? including those who sell it as a service. they have to have some infrastructure in some datacenter right?
make a law that forces every edge router of a datacenter to check for source IP and you would eliminate a very big portion of DDoS as we know it.
until then, the only real and effective method of mitigating a DDoS attack is with even more bandwidth. you are basically a black hole to the attack, which cloudflare basically is.
alright, what you are proposing is kind of hard to do. Source routing is not easy, and source validations is even harder.
and what prevents me, as a abuse hoster or "bad guy" from just announcing my own IP space directly on a transit or IXP?
You might say, the IXP should do source checking aswell, but what if ipspace is distributed/anycasted across multiple ASN's/ on the IXP?
Also, if you add multiple egress points distributed across different routing domains, it gets complicated fast.
Does my transit upstream need to do source validation of my IP space? What about their upstream? Also, how would he know which IPspace belongs to which ASN's considering the allocation of ASN numbers and IP space is distributed across different organisations across the globe. (some of which are more malicious/non function than others[0]). Source routing becomes extremly complex because there is no single, universal mapping between IP space and ASN's they belong too.
[0]https://afrinic.net/notice-for-termination-of-the-receiversh...
The biggest attacks literally come from botnets. There’s not a lot coming from infrastructure services precisely because these services are incentivized to shut that shit down. At most it would be used as the control plane which is how people attempt to shut down the botnets.
Time to consider alternatives: https://x.com/GithubProjects/status/1990804801811329329
We finally switched to CF a few weeks ago (for bot protection, abusive traffic started getting insane this year), finally we can join in on one of the global outage parties (no cloud usage otherwise, so still more uptime than most).
It is a relief that they hosted the status page on someone else's infrastructure.
This is worse than than the Amazon outage. I couldn't even login to Cloudflare.
Wow, with outage of a scale like this, it must be measurable as a loss in global GDP
I swear they all sleeping and that one guy on call is like "sheeeeet"...
Ironically, I was trying to access https://downforeveryoneorjustme.com/ at the time, which also went down due to Cloudflare.
It would have been ironic had this https://www.thewebsiteisdown.com/salesguy.html been hosted behind cloudflare.
Israel must be testing it's doomsday plans again
Wonder if the internet will soon be deleted.
Sure, blame the Jews. Idiot.
To be clear, they blamed Israel. Not that it helps make what they said any less idiotic.
Was back up for a moment ( within 5 minutes of being down), currently down again
In Yorkshire they would say "up and down like a bride's nightie"
...or a whores drawers
1 reply →
https://www.cloudflarestatus.com/
even status page is giving 504 Gateway Timeout ERROR The request could not be satisfied. now in India
[dead]
ChatGPT and Claude are down as a result, too.
Y'know, along with most other SAAS services.
ChatGPT and Perplexity AI are down, I was just about to try and use ChatGPT.
All trains are stuck in south of France for « broken signalisation ». Wonder how related this is.
Edit: it was related
https://www.laprovence.com/article/region/83645099971988/pan...
Edit2: They edited the article stating it wasn't related.
If they do use Cloudflare... why in the everlasting name of Hell did they connect a railway control and signalling system to the Internet?!!!
Because javascript programmers are cheaper/easier/whatever to hire? So everything becomes web-centric. (I'm hoping for this comment to be sarcastic but I wouldn't be surprised if it turns out not to be)
1 reply →
Wow, so much is down. Nothing Cloudflare protected is loading for me in Indiana, and the Cloudflare dashboard is broken as well.
I hope it gets resolved in the next hour or two, or it could be a serious problem for me.
This centralisation is worrisome. Single points of failures have always been a bad idea, especially when that point of failure is out of your control.
PS:Someone really doesn't want Gemini 3 to get air time today
Didn't have my site on cloudflare bc it would be faster for chinese users (its main demographic) so i THOUGHT i was fine for a second until i remembered the data storage api is behind cloudflare
Hey, this is fun, all my websites are still up! I wonder how that happened? I don't even have to worry about my docker registry being down because I set up my own after the last global outage.
I had a lot of fun like you as well, until I got my first DDoS and bot attacks. There's a reason Cloudflare has 20% of internet traffic.
Does it cost you a lot?
One of my other worries is having fight bots over a couple hobby sites while I have other fires to put out (generally in life).
Internet is down, I guess I'll just look out of the window for a bit
Ands its back. It was a very short window to look out of the window.
Not too much window to see anyway
Probably a good time to contact the CEO of Cloudflare.
Looking forward to the post-mortem.
Is anybody keeping statistics on the frequency of these big global internet outages? It seems to be happening extremely frequently as of late, but it would be nice to have some data on that.
I discovered the problem by trying to access https://downforeveryoneorjustme.com/ ironically :)
This Internet thing is steadily becoming the most fragile surface attack out there. No need for nuclear weapons anymore, just hit Cloudflare and AWS and we are back to the stone age.
We're on the enterprise plan, so far we're seeing Dashboard degradation and Turnstile (their captcha service) down. But all proxying/CDN and other services seem to work well.
[dead]
Came back up for a few minutes and has gone down again. https://www.cloudflarestatus.com has nothing.
Love that for them. Congrats on building such a re-centralized web!
Cloudflare is now a systemic risk for a state-sponsored attacker to bring down the entire web.
Why are we seeing AWS, then Azure, then Cloudflare all going down just out of the blue? I know they go down occasionally, but it's typically not major outages like this...
Update: It seems to be back, the downtime lasted maybe 5-6 minutes
Looks like it's going up and down intermittently, maybe something is only half-rolled-out.
Spoke too soon. The "internet" is down again.
I'm still seeing some sites not working - ironically they're down detectors:
https://statusfield.com/status/cloudflare https://statusgator.com/services/cloudflare
It came back up for me, now it's down again. EDIT: I am in the UK, using the London Cloudflare server.
It was down, then up, then back down for me.
EDIT: And it's back up.
EDIT EDIT: And it's back down lol
Down again in London
Still down for me.
Down... "Please unblock challenges.cloudflare.com to proceed." On every Cloudflare hosted website that I try. This timing SUCKS.......... please resolve fast! <3
Ah! Well, all of my websites are down! I’m going to take screenshots and have it as part of my Time Capsule Album, “Once upon a Time, my websites used to go down.”
Once upon a time, the end of the world happened
Yes, it impacted our services https://www.cloudflarestatus.com/
Most down-detectors are down due to their dependency on CF.
Cloudflare Mumbai, Bengaluru, Chennai, Hyderabad edge-nodes also unable to serve content.
x.com down.
Few quick-commerce apps are acting up at times.
You can't even turn off caching from Cloudflare because...the Cloudflare dashboard is down.
So everyone who's wrapped their host with Cloudflare is stuck with it.
Since when does critical infrastructure fail weekly?! One week is AWS, then azure + AWS, now cloudfare...
Time to go back to on prem. AWS and co are too expensive anyways
A lot of people are "on prem" but use CloudFlare to proxy traffic for DDoS attack mitigation, among other reasons.
If someone wanted to learn about how the modern infrastructure stack works, and why things like this occur, where would be some good resources to start?
I just wanted to ask the same thing. I have a really basic idea of how everything is connected but would love to jump in more in depth.
How can such big incidents occur where half of the internet is down because of one company and what can be done to prevent that?
Incident post mortems: https://github.com/danluu/post-mortems
I'm really surprised by the sheer scale of how many websites this outage is affecting. We really need to decentralize all of these monolith clouds.
When this kind of thing happens it makes me feel better about my own programming problems.
I wonder if it has anything to do with the replicate.com purchase? Probably not.
I sometimes question my business decision to have a multi-cloud, multi-region web presence where it is totally acceptable to be down with the big boys.
That was something we discussed at my workplace.
Prior hosting provider was a little-known company with decent enough track record, but because they employed humans, stuff would break. When it did break, C-suite would panic about how much revenue is lost, etc.
The number of outages was "reasonable" to anyone who understood the technical side, but non-technical would complain for weeks after an outage about how we're always down, "well BigServiceX doesn't break ever, why do we?", and again lost revenue.
Now on Azure/Cloudflare, we go down when everyone else does, but C-Suite goes "oh it's not just us, and it's out of our control? Okay let us know when it fixes itself."
A great lesson in optics and perception, for our junior team members.
Many services have just disabled the CF proxy and use only DNS. If your end server has SSL and can handle some traffic, it might work for a while.
Supabase is down bad too... need to work on my project!
Haha they updated their status page: "Identified - A global upstream provider is currently experiencing an outage which is impacting platform-level and project-level services"
A global upstream provider :)
Well that was quick. I saw a status saying server maintenance. And then it changed to “we’re looking into this” must’ve made an oopsie I suppose
ELON! GO AND KICK THOSE CLOUDFLARE ASSES!
or search a new job for yourself. Maybe digging to the earth core. Why? Idk. Because then you can say: I did it, or so.
CONTROL PLANE FAULT: CRITICAL SECURITY OVERRIDE enforced across us-east-1 and eu-west-2. ERROR CODE: STS.SecurityAuditLockout (403 Forbidden).
Context?
Glad to see things are actually working here! Also, my website (halomate.ai) is using CF too, and surprisingly, it's working fine as well
You spoke too soon!
What is funny us that on their global status list for services, everything looks green except "network" that is "offline".
I started restarting my own servers thinking something went awry again, that's how much I usually trust them not to be down. Interesting.
Funny that their status page shows almost all locations “Operational” but they’re not. Are they updating the page manually and keep it green?
I assume the locations are operating fine, since you can see the error pages. The culprit here is probably the Network, which at the time of writing, shows up as offline
Like AWS I can't help but think we're going to get more and more of these as the tech industry continues to DOGE its workforce.
Vibe coding too
The privacy kingpin in india, has been caught and most of the network are affected. We will be resuming the servers from Sweden shortly.
https://www.cloudflarestatus.com - has been updated.
Looks like the status page is suffering too because it can't load jQuery:
(index):64 Uncaught ReferenceError: $ is not defined at (index):64:3
ChatGPT isn't working.
No suicides created by ChatGPT Today. Billions of dollars in GPU will sit idle. Sudden drop of Linkedin content...
World is a better place
Whenever I try collapse root threads on this page it locks up the browser for 5+ seconds.
Windows 11, latest Edge browser, 64GB of RAM, 13th Gen i7.
>Windows 11, latest Edge browser
There's your issue
Windows 11 has some annoying UI decisions, but is otherwise 100% reliable for me and absolutely my OS of choice. Edge is essentially Chrome, but generally ties in better with the MS accounts ecosystem which I already use.
So, what, specifically, is the issue?
Does not happen to me under ubuntu, FWIW
Back up for me now
Down again
Yep - down again for me too!
Whole bunch of local South African sites are dead, with cloudflare http 500 errors. Can see Lisbon & Amsterdam crashing out.
Yep, got around 100 SMSs from our uptime monitoring service that our Cloudflare sites are down. Nothing much we can do but wait.
Just were talking about how Replicate might have better availability due to joining Cloudflare, and they too went down... Oops.
What a wild ride, the traffic to my site is more akin to a rollercoaster. Got better for a few mins and then fell back apart.
Its ironic that downdetector is down as well..
downdetector is up in italy
This one is bigger than the AWS East outage...
Lots of valid concern about us all using CF, but is their an alternative to their WAF that isn't enterprise expensive?
Yep, bunny.net is great, we also use it. And look at https://altcha.org as a Turnstile replacement
Depends on your needs, but for example there's Bunny Shield: https://bunny.net/shield/
Thanks for the pointer. They'd still wind up being a couple thousand dollars more annually than what we pay CF now.
Frustrating, because I know I'll get asked today if we have an alternative to using CF, and I don't have a good answer.
Keeps going up and down for me, I cant access DownDetector to check. The first website I noticed it on was Blender Artists.
Yes, all sites are down. Getting a 500 error from India.
Update: Looks like the issue has been resolved now. All sites are operational now.
Gemini and other agents are now failing when they search for something on the web. ChatGPT can't even be accessed.
I wish the "pause CF" button would work via API or via any other way, even if there is an outage like this.
Insane, my website https://geddle.com totally down
seems to work here
down again, all my website rely on Cloudflare DNS are down
Well it was bound to happen eventually, the "Down Roulette" has decided it should be Cloudflare this week!
Cloudflare is the real backbone of the internet in 2025. It should be a globalized property like ICANN or something
How would that prevent outages? Honest question
just yesterday cloudflare announced it was acquiring replicate (ai platform) "the Workers Platform mission: Our goal all along has been to enable developers to build full-stack applications without having to burden themselves with infrastructure" according to cloudflare's blog, are we cooked?
Are they using Cloudflare perchance? (scnr)
I was reading up on home lab server racks, and every single site is down with a Cloudflare error. So much for DIY!
makes you realise, if cloudflare or one of these large organisations decides to (/ gets ordered by a deranged US president to) block your internet access, that's a whole lot of internet you're suddenly cut off from. Yes, i know there are circumventions, but its still a owrrying thought.
Email workers of all things seem to have slowed down dramatically, although they're not down completely.
I'm genuinely curious how much of the web depends on cloudflare and AWS. This centralisation sucks though
Took down both Twitter and Rateyourmusic. This is a targetted attack against me specifically and nobody else
Garmin site not working for example, and they removed the export option from the mobile application though.
I would love to be a bee on the wall in the room where Cloudflare response engineers are working right now.
Well, was reading the docs for Express, and shouted wtf a couple of times, before seeing this post on HN.
Ironically, cloudflare.com is not down.
Yeah I wonder how that works
"Don't get high on your own supply"?
Time to check Hacker News instead of work. Even my usual procrastination websites are down due to this.
ChatGPT was down so I couldn't work, go to lichess, turns out, it's down too now what do i do?
your username makes me wonder what exactly your job is, and what you need ChatGPT for
Touch grass.
Some CDNs are down too, for example cdn.tailwindcss.com And apparently I can't log into Hackernews?
https://hacked.stream/
We really do have two surprise holidays every year: AWS Day and Cloudflare Day. Happy outages, everyone.
Looks like it. Even sites like isup.me seem to be down, lots of cloudflare error messages across the net
For some reason linear.app is working but according to their headers they should be behind Cloudflare.
According to the status page services are being restored
I still find a lot of websites/applications (including my own) affected.
Anyone seeing a link between AI-generated infra code and this year’s wave in popular service outages?
is there any way to get past challenges.cloudflare.com with tokens or something?
so stupid there is no fallback and can take down 50% of the internet
adding:looks like even Cloudflare's Silk Privacy Pass with challenge tokens is broken
such a great idea to put half the web behind a single fail point without fallover
It’s been 45 minutes and I’m already looking forward to the day Kevin Fang makes a video about this
Suddenly feeling better about our 99.9% uptime SLA.
When even Cloudflare goes down, nobody can blame the little guys.
Maybe this incident will make people rethink putting Cloudflare blindly in front of every website.
In theory even a single company service could be distributed, so only a fraction of websites would be affected, thus it's not a necessity to be a single point of failure. So I still don't like this argument "you see what happens when over half of the internet relies on Cloudflare". And yes, I'm writing this as a Cloudflare user whose blog is now down because of this. Cloudflare is still convenient and accessible for many people, no wonder why it's so popular.
But, yeah, it's still a horrible outage, much worse than the Amazon one.
The "omg centralized infra" cries after every such event kind of misses the point. Hosting with smaller companies (shared, vps, dedi, colo whatever) will likely result in far worse downtimes, individually.
Ofc the bigger perception issue here is many services going out at the same time, but why would (most) providers care if their annual downtime does or doesn't coincide with others? Their overall reliability is no better or worse had only their service gone down.
All of this can change ofc if this becomes a regular thing, the absolute hours of downtime does matter.
1 reply →
X is down, and many many other sites. This is not the web I grew up on. Do not centralize people.
Funny enough, it happened on the same day that AWS CloudFront launched their flat-rate plans!
This is reason 1, 2 and 3 on my "Top 3 Reasons to not Put All Eggs in One Basket" list.
It's interesting to see hacker news response time reaching almost 2 seconds for this post.
Almost every site I'm trying to connect to is down. The internet is way too centralized.
Things seem to be coming back up... been almost 45 minutes, since my first alert came at 0836
I was using Cloudflare WARP; had to turn it off to access most of the websites i visit daily.
Not affected using tunnels, CDNs.
it's probably related to the recent ddos attacks they helped mitigating.
They offer a great service for now, i hear.
Unfortunately, that means they can also break 75% of the internet.
How come cloudflare.com is still working, do they not trust their own orange proxy service?
Update: Cloudflare has announced they will be sacrificing their CEO at the alter in penance
Such a shame though. I wonder how long it's going to take before they bring it back up
Cloudflare issue is due to latency in DDOS errors.
>cups.servic
>foomaticrip
Form a [cerulean] type-font in the page-source.
And here I was wondering why my website shut down & why I couldn't tweet about it
Singapore is down as well in Asia
OK, it seems to be working again.
Nope.
Indeed, it worked for 2 minutes, but not anymore.
Yes.
I learned from reliable sources about a denial-of-service attack; everything went down.
I can't even load the dashboard to change to "DNS only". Nothing to do?
I can't rebuild my NixOS image because of this lol. (chrome install not working)
Poland. Most of the popular sites are down. Including community forum on Cloudflare.
Crazy to think that it's apparently acceptable to centralize the web like that.
I've been waiting for hours, it looks like I can finally take a day off today.
Seems like coudflare activated the maximum llm-scraper-bot-protection for everyone.
API still seems to work if you already have a script to hand to unproxy everything.
Just when I was assigned a task yesterday but decided to do it today early morning.
Cloudflare captchas don't work, which has taken down both Claude and Perplexity for me.
Lovely.
Even Twitter is down. Most of my customers are shouting at the top of their head!
Has anyone else noticed a major drop in email spam with this cloudflare outage?
Cloudflare's own status page is not responding. I guess it's down too?
All Cloudflare websites are down!!!! When will it get fixed? I dont have time.!!
For fun, I asked google what's an alternative to Cloudflare. It says, "A complete list of Cloudflare alternatives depends on which specific service (CDN, security, Zero Trust, edge computing, etc.) you are replacing, as no single competitor offers the exact same all-in-one suite"
Imagine using an all-in-one suite.
I think Companies are firing wrong people that we get these downtimes so often.
Ukraine. Sporadic outages as well. Error pages blame Cloudflare Warsaw servers.
So it begins. Now is the time to banish the evil presence from the internet. :D
These big cloud providers are turning into giant off-switches for the internet
I’m glad I have kindle in my bag today. Websites down but not much we can do.
Browser Working, Portland Cloudflare Error, apps.ideal-logic.com Host Working
80% of web sites I visited in last 15 minutes are not available anymore, LOL
Probably the IDF trying a mass network attack to go and Occupy the Holy See
Even downdetector is down, I can't get through the Cloudflate captcha.
Even downdetector is down, I can't get through the Cloudflate captcha.
The whole damn internet now depends on them. I guess I am bullish for $NET
So do we have a guarantee that posts are not made by AI for a few minutes?
Cloudflare fully down for me and my team, half of internet just vanished
Same we use cloudflare as image cdn and its R2 service a lot, cant access anything over its cdn route
Still ongoing. Some requests going through. Some get the cf error page.
at least https://xprice.ro is up, dont know how and why cuz we use cloudflare and we're hosting in germany/hetzner
down for india :/
interesting, working in Romania. So it's somehow related to they geo-balancing infrastructure.
i use checklyhq.com for my website status page and those are down as well...
https://sexyvoice.checkly-dashboards.com
What are the odds this is a human configuration error related to DNS?
Saw cloudflare go down before my very eyes on colonist.io in Australia
Off topic, but the 500 page from prusa3d is quite good:
https://www.prusa3d.com/
https://imgur.com/a/OW5KL8r
used a down-detector site to check if cloudflare is down, but the site is running on cloudflare, so i couldnt check if cloudflare was down for anyone else, because cloudflare was down
I was reading some novels today and then Bam! a cliff hanger now...
What is happening to cloudflare, anybody knows? Everything is down!
It's outrageous that it hasn't been fixed since 2 hours
Even twitter is gone. Where will I post memes mocking cloudflare?
World infrastructure is taking a hit. First us-east and now this.
If a cloud vendor with 1 million users experiences a long term outage: the vendor has a serious problem. If a cloud vendor with 1 billion users experiences a long term outage: the internet has a serious problem. Yada-yada-yada xkcd/2347 but it's the big block in the middle which crumbled
Seems to work again. 40min downtime for many services it seems.
For cf tunnel alternatives are available, I maintain pinggy.io
Dallas CF is down, so basically every app and website is down.
Seems like the merging with Replit didn't work so well :p
I tried to go to Downdetector before coming to Hacker News...
Would be funny if it was a record breaking ddos on cloudflare
The reason why you laugh
This is a nightmare situation, we can't get in anywhere
AWS, then Azure, now Cloudflare. Welcome to the AI era. Meanwhile my hetzner vServer has been running for three years without issues.
Waking up in East coast USA to all sites being down, yay...
What is that bright yellow thing in the sky?
Our company is loosing money with every second of downtime.
Unfortunately downtime is just the cost of doing business. Everyone else using Cloudflare is in the same boat.
The Eastern Herald news website is down. Easternherald.com
It also took chatgpt and claude , trying to access from pk
even the famous applications like Chatgpt, x.com are down
why do I always get "Server Error" and not an explanation that Cloudflare is having problems? This makes me look bad in front of my customers.
https://news.ycombinator.com/user?id=jgrahamc
>I was Cloudflare's CTO.
A gentle reminder to not take any CF-related frustrations out on John today.
He's now on the Board so not left.
Not that I think blaming individuals on forums who are already under stress is a good strategy anyway.
His personal website is down too.
Oh no, we can’t take a (former) executive to task about what they’ve wrought with their influence!!! That would be wrong.
If anything, he should be the first to be blamed for the greater and greater effect this tech monster has on internet stability, since, you know, his people built it.
Is it me or do these outages happen pretty often lately?
When's the last time Cloudflare had such an outage?
Same problem here in Italy. Website up and down again.
I’m assuming hard rock (bet) is run by cloudflare also
My static website hosted on cloudflare works :/
is anyone's hackernews lagging when loading the comments? ive seen posts with 2k comments before but for some reason this took longer
Also seeing this on my websites hosted on cloudflare
Third time's the charm? Seems more stable now.
It’s down worldwide practically, in the US,UK,NZ,AU
Even IN.
Did something happen to the Cloudflare lava lamps ?
More proof that central planning doesn't work
it's funny I first noticed this visiting a random blog, then went on X and got the same error... is Cloudflare the Internet now?
Getting a 500 error from cloudflare in Manchester
Whats peoples bets on the root cause of this...?
Their Oregon controlplane somehow? Either a misconfiguration of some sort (BGP??) or a power outage like they had before.
Even pornhub is down becuase it use cloudflare.
Couldn't work. Fuckin' cloudflare . Feels like 25% of the Internet is down.
Im going home. Time for a beer .
Greetings from germany
Just when the eastern cities are waking up too.
Turnstile is throwing 500 internal server error
and when i tell people they dont need cf for most of their sites they laugh at me. look whos laughing now. theyre down, im up
ahahahhaahah
I just comment here to be part of the history.
Ironic, Cloudflare taking DownDetector with it
Ironic, cloudflare taking downdetector with it
I was so scared, I thought my VPN had failed.
Cloudfare completely breaking the internet...
The whole internet hinges on this one company
Certainly most of the independent internet.
if you include amazon its actually 2
My uptime monitor OnlineOrNot is also down...
OnlineOrNot's fallen back to AWS for monitoring, so you should still be getting alerts.
The dashboard's API server runs on Cloudflare and is currently blocking all logins, will fix.
Twitter too is down, almost half the internet
my cloudflare pages website is down - 500 server error :(
cannot login to get to workers to check - auth errors
I thought this was the point of a cached CDN!
The status is still Red on their dashboard.
That must have been a really big backhoe...
Took quite a while for it to show up oddly.
When will Cloudflare actually split into several totally independent companies to remedy that they bring down the Internet every time they have a major issue?
just yesterday cloudflare announced it was acquiring replicate (ai to "help" it's workers) i believe
My Window System seems to be working fine.
All my websites are down due to Cloudflare
Cloudflare is a central point of failure.
some sites are already up again, including the cf dash and downdetector, both ironically down a few minutes ago
So ... any bets the cause isn't DNS?
We still doing BGP update typos?
Nope. Looks like they have a DNS-like configuration manager...
OpenAI & chatGPT also down from this
Searching for Cloudflare alternatives...
interesting that HN is not on Cloudflare but the YC website is behind Cloudflare so it's also down
Is this why x.com isn’t working right?
Yeah I just got a 500 error on medRxiv
I hecking love depending on big corpo
All bets on DNS being the root cause.
Singapore Cloudeflare Server is down.
Pretty sure they used Vibe Coding...
Testing the fences... clever girl...
Not out of the woods yet it seems...
this should affect a lot sites? I'm trying to access tailwindcss and I can't as well!
I am using cloudflare as back-end for my site (workers) but have disabled all their other offerings. I was affected for a short while but seems to be less affected than other people.
Not my site though
https://www.rxjourney.net/
Lol. I mean I love Tailwind but it seems like the least trivial site/service to be down right now haha.
Warsaw, Los Angeles and Newark down
Rome, Palermo, Milan, Catania Down
It took supabase and X down for me
thought i'd been hacked, was ready to throw in the towel for this career and my saas
Down in Australia and New Zealand
Yea, had trouble accessing Upwork
scaleway.com is down as well. I'm really wondering how a CSP can use cloudflare...
genuinely makes me sad for the people there. this must be a living nightmare right now.
Why? If any company has enough technical people, resources & processes in place it must be them, no?
Black HN ribbon for the Internet
I assume you're joking, but as an FYI, Rebecca Heineman died:
https://www.pcgamer.com/gaming-industry/legendary-game-desig...
Back up for me now
Edit: and then back down again
The biggest learning for me from this incident - NEVER make your DNS provider and CDN provider the same vendor. Now, I can't login into the dashboard, even to switch the DNS. Sigh.
what's a good alternative for their WAF, that isn't enterprise expensive?
Even ChatGPT.com is down! Wow!
Australia here. plenty down rn
x is also not working properly
yes absolutly yes, i have tried severak region and all of then receive 500 error
chatgpt.com is not working because they are relying on cloudflare for challenges
when ig went down, I came to X to post Now that X is down, is hn the place to be
Gpt and perplexity still down
Germany
Bucharest Cloudflare down too
Yes. I get 500 on my website.
ironically downdetector.com is down because they use cloudflare for challenges
Working just fine in the UAE.
I'm betting on DNS fail
Forced to play Runescape now
God my favourite website pornhub.com is also down why onn earth cloudflare i just now came from school.
pretty much just my twitter usage as of now. my sites dont use cloudflare
HN has become the place to check if any HyperScaler + Cloudflare is down.
while my colleagues are wondering why cloudlfare isn't working and are afraid it might be something from us locally, I'll first check here to make sure it's not a Cloudflare / AWS problem in the first place.
When was HN down for the last time? :)
I actually came here to check because downforeveryoneorjustme.com and downdetector are offline as well.
I'm going to buy more.
You
Browser Working
San Jose
Cloudflare Error
mysite.com
Host Working
Lol! Like a solar eclipse!
Downflare
Cloudflare'nt
Downflare
aaa the only night im free to watch nana and cloudfare sabotages me T_T
Yeah, dammit! NOT TODAY!!!
hahaha this is nuts, can we connect to each other without cloudflare?
at least with Cloudflare, we may have the postmortem report tommorow
Waking up to chaos. Nice.
Let me guess, DNS issues?
Oh the nuclear bomb proof network is unusable because someone sneezed over at Cloudflare.
Give me back ChatGPT pls
downdetector.com is down because it uses cloudflare challenge....
im just concerned now :l
Does requiring proof-of-work in order to connect accomplish 99% of what Cloudflare does?
Not even 1%
Can we at some point acknowledge that constant cloud disruptions are too costly, and can we then finally move all of our hosting back on-prem?
It's the old IBM thing. If your website goes down along with everyone else's because of Cloudflare, you shrug and say "nothing we could do, we were following the industry standard". If your website goes down because of on-prem then it's very much your problem and maybe you get to look forward to an exciting debrief with your manager's manager.
That's lazy engineering and I don't think we as technical, rational people should make that our way of working. I know the saying, but I disagree with it. My fuckups, my problem, but at least I can avoid fuckups actively if I am in charge.
7 replies →
Funnily and ironically enough, I was trying to check out a few things on Ansible Galaxy and... I ended up here trying to submit the link for the CF ongoing incident
I would only consider doing stuff on-prem because of services like Cloudflare. You can have some of the global features like edge-caching while also getting the (cost) benefits of on-prem.
can you define "constant"
Well, between AWS US EAST 1 killing half the internet, and this incident, not even a month passed. Meanwhile, my physical servers don't care and happily serve many people at a cheaper cost than any cloud offer.
7 replies →
This is ridiculous:
They just posted:
Update We've deployed a change which has restored dashboard services. We are still working to remediate broad application services impact Posted 2 minutes ago. Nov 18, 2025 - 14:34 UTC
but,.. I'm stuck at the captcha that does not work: dash.cloudflare.com Verifying you are human. This may take a few seconds.
dash.cloudflare.com needs to review the security of your connection before proceeding.
just scheduled maintenance in Tahiti guys, nothing to see here
could be related? Tahiti is a small island in the south pacific, seems rather suspicious
first time know a porn website i use is backed by cloudflare.
using a cloudflare tunnel for local dev work, completely down
Cant websites have an auto backup/redirect in case cloudflare or AWS go down?
Puup but, anyone else?
Under Attack? SO DO I!
I can't access my fav porn sites, cloudflare Singapore host is down. Oh man!
I honestly think people should practice more chaos engineering themselves and switch off services at random like Cloudflare and have failure plans.
LETS GO ITS BACK ONN
is this the result of AI software improvements?
Seems to be back up
Indonesia down also
even popular apps like x.com, chatgpt.com are down.
Man in the middle!
Is it down again??
It's back up.
the sheer number of websites this is taken down!
Another thread: https://news.ycombinator.com/item?id=45963949
even in India there is cloudfare's outage
Probably they adopted vibe coding as the main way to write code
Is it DNS or BGP?
Why settle for just one?
Seems to be over.
500 from thailand
500 from Thailand
This one is huge.
This sounds huge.
Still down for me
I was about to cream watching my fav video on X and it is down
Down is Missouri
all the captcha's are also not working
glad to see hacker news is still working.
monopoly causes problems for all customers?
oh no, anyway
dl.acm.org down, arxiv.org has my back!
cursed at me ISP for absolutely nothing
Half of the internet is down. That's what you get for giving up the control of the service that suppose to be decentralized to one company. Good, maybe if it costs companies few billions they will not put all eggs in one basket.
I’m assuming Hard Rock (Bet) is run by Cloudflare too
Down in Taiwan
even in india cloud fare has a outage
This seems to corroborate the recent controversial claims that American workers do not possess the aptitudes needed to succeed in the 21st century. If only we could have gotten more children to learn to code. Sigh.
Very true, clearly all these issues that started cropping up after fortune 500 companies started offshoring indicates they didn't offshore fast enough
Accelerate the eschaton.
syntax error: unexpected semicolon
it took X , supabase down as well.
PEBKAC error
it works, then stops, then works
germany as well. Claude down too
X, Chatgpt, all kinds of sites and services around the eu, it's a massive outage
Right now is working in Italy: https://www.endoacustica.com/ I hope will be not down again.
Europe down
Poland down
deny (clippy:: unwrap_used)
the day the earth stood still
should we think about Akamai
cloudfare is down officially
It’s back.
Oh boy
SPOF
BLR Down
Seoul down
Jakarta
Why the hell is my claude saying "Please unblock challenges.cloudflare.com to proceed."
And then still failing anyway? Why do I need CloudFlare to access claude.io? Wtf?
AI companies don't want AI to use AI.
Its back!
YESS
Tokyo too
Yes (Asia)
hacked.stream was down too
I am paying for this shit service and this is my longest downtime I had in years. Can anyone recommend any other bottleneck to be annoyed with in future?
Feels like 25% of the Internet is down just because of fuckin' cloudflare.
I'm leaving the redaction because I couldn't work atm...
Time for a beer , greetings from germany!
akxeder.eth.ac is working
still down in Australia
guys hype up its backkk
We are doomed! Is it another vibe-coding disaster?
Now I can switch everything off and go home. We are not using CF at our site, but CF error it is a good reason to have a day off
Puup but, how about yous guys
My site https://mediamistrz.pl/ not working
Once again vindicated by running my own CDN and not living with the irrational belief that everything needs cloudflare.
supabase is down too
it comes back up now
africa down as well
Who is laughing now Elon ?
Aw man, how dare this affect me personally? :P (Tried to get to openstreetmap.org which is behind cloudflare.)
Cloudflare is still down
Seemingly nobody cares about being in two different availability zones. Or is this a deeper problem?
are we cooked :l
Telnyx seems to be down for me. Actually I lied, I think it is working. at least call connected.
claude.ai down too... lots of programmers are gonna have to pretend they code in another way...
Easy: "the site is done, it is fantastic, but cloudflare is down so you cant see it"
It's working fine for me
Yep, seems to work cause my nearby colleague started copy/pasting massive chunks of code again.
I guess claude is more important than your average site :)
For me right now, Claude.ai is down, but Claude Code (terminal, extension) seems to be up and happy. Suggests that API is probably up.
1 reply →
anyone have reference for flutter opening +917494920753
are we cooked?
singapore down
I can't login in to Cloudflare itself because it is itself affected, holy bs
Now is as good a time as ever to look at moving our eggs into some other baskets
x.com is down
Almost like centralizing everything on a single service has consequences.
What?! No way! Keen to see the post mortem on this. Its always DNS.
Checks Cloudflare Status - yeah, everything's hunky dory bro.
poland down
A good reminder for advancing decentralization and p2p networks!
i am trying to this page https://cryptoquip.net/ but it is showing, Cloudflare Error
aisuru?
its back !
asd
yess ddos
yep
We're doomed. Is this another vibe-coding disaster?
I thought it was problem with my network lol XD
Just now it was acknowledge in the status page.
Yes. At least in Germany and Spain. Interment.
Cloudflare has always been a shit company.
Reminder that this is not the web we want.
akxeder.eth.ac is not down
facing cloudflare downtime
I'm betting on another DNS failure
cloudfare down in india
Aaaaand it's down again.
Dude what's up with X???
why does no one use google cloud? literally never has any issues lol
Expensive, hard to use, a million dashboards
because no one uses it
Bluesky still chugging along.
Just saying.
They are decentralized with servers all on the East coast that they self host. They do have points of failure that can take down the whole network, however.
What part of "single point of failure" do people not understand?
Stop. Using. Cloudflare.
how are we still down? it's been 2 hours lol
Ha ha ha hahahahahaaaa hahahahahaaaahahaha, fuck 'em.
Welcome to AOL.
in Japan, now, it's alive.
how about your location?
Just happy I wasnt IP banned lol
Bro what's up with X???
Bro no way bro what's going on bro? BRO?! Bro this is crazy bro like one of your cartoons that you jerk it to every day bro
Cloudflare fucking sucks
lol
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
I'm weary of the broader internet having spofs like AWS and Cloudflare. Can't change routing or DNS horizons to get around it. Things are just broken in ways that are not only opaque, but destructive due to so much relying on fragile sync state.
Will my Spelling Bee QBABM count today, or will it fail and tomorrow I find out that last MA(4) didn't register, ruining my streak? Society cannot function like this! /s
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
West's Great Firewall
/s
[dead]
The top black bar is appropriate /s
[flagged]
Bruh I was watching porn on Twitter, I thought the FBI got my ahh.
username fits...
Who the fuck talks like this
do any of users facing cloudfare outage ???!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
AWS, Azure, now Cloudflare, all within a month, are hit with configuration errors that are definitely neither signs of more surveillance gear being added by government agencies nor attacks by hostile powers. It's a shame that these fine services that everyone apparently needs and that worked so well for so long without a problem suddenly all have problems at the same time.
Most or all of these lost significant institutional knowledge through layoff after layoff and jobs moved to lower cost countries.
Maybe a coincidence or maybe not.
AWS was not a configuration error, it was a race condition on their load balancer's automated DNS record attribution that caused empty DNS records. As that issue was being fixed, it cascaded into further, more complex issues overloading EC2 instance provisioning.
I don't get how you ran to that conclusion...
Gemini is up, I asked it to explain what's going on in cave man speak:
YOU: Ask cave-chief for fire.
CAVE-CHIEF (Cloudflare): Big strong rock wall around many other cave fires (other websites). Good, fast wall!
MANY CAVE-PEOPLE: Shout at rock wall to get fire.
ROCK WALL: Suddenly… CRACK! Wall forgets which cave has which fire! Too many shouts!
RESULT:
BIG PROBLEM: Big strong wall broke. Nobody gets fire fast. Wall chief must fix strong rock fast!