Comment by tempest_
7 hours ago
Unfortunately the internet sucks in 2025.
If you have a site with valuable content the LLM crawlers hound you to no end. CF is basically a protection racket at this point for many sites. It doesnt even stop the more determined ones but it keeps some away.
Yep for anyone unaware of how awful things truly are, look up what a "residential proxy" is. Back in my day we called that a botnet.
Oh, they're still botnets. We just look the other way because they're useful.
And they're pretty tame as far as computer fraud goes - if my device gets compromised I'd much rather deal with it being used for fake YouTube views than ransomware or a banking trojan.
You can make a little bit of cash on the side letting companies use your bandwidth a bit for proxying. You won’t even notice. $50/month. Times are tough!
Of course the risk here being whatever nefarious or illegal shit is flowing through your pipes, which you consented to and even received consideration for.
1 reply →
CF would be a protection racket only if CF is the cause of the problem CF is charging money to solve.
And yet half the HN front page every day is promoting LLM stuff.
"The internet sucks", yes, but we're doing it to ourselves.
Would you rather not have LLMs?
Absolutely. They have dramatically worsened the world, with little to no net positive impact. Nearly every (if not all) positive impacts have an associated negative that that dwarfs it.
LLMs aren't going anywhere, but the world would be a better place if they hadn't been developed. Even if they had more positive impacts, those would not outweigh the massive environmental degradation they are causing or the massive disincentive they created against researching other, more useful forms of AI.
IMO LLMs have been a net negative on society, including my life. But I'm merely pointing out the stark contrast on this website, and that fact that we can choose to live differently.
3 replies →
hard yes, all of the technical discussion aside, the constant advertising deluge of every company touting AI is mind numbing.
It's helped me learn some things quicker, but I definitely prefer the old days.
Good lord yes. No question.
Absolutely. And while we're at it, let's do away with social media.
Yes.
A solid secondary option is making LLM scraping for training opt-in, and/or compensating sites that were/are scraped for training data. Hell, maybe then you could not knock websites over incentivizing them to use Cloudflare in the first place.
But that means LLM researchers have to respect other people's IP which hasn't been high on their todo lists as yet.
bUt ThAT dOeSn'T sCaLe - not my fuckin problem chief. If you as an LLM developer are finding your IP banned or you as a web user are sick of doing "prove you're human" challenges, it isn't the website's fault. They're trying to control costs being arbitrarily put onto them by a disinterested 3rd party who feels entitled to their content, which it costs them money to deliver. Blame the asshole scraping sites left and right.
Edit: and you wouldn't even need to go THAT far. I scrape a whole bunch of sites for some tools I built and a homemade news aggregator. My IP has never been flagged because I keep the number of requests down wherever possible, and rate-limit them so it's more in line with human like browsing. Like so much of this could be solved with basic fucking courtesy.
Can I raise that to no LLMs or SEO?
1 reply →
Yes
Not to speak for the other poster, but... That's not a good-faith question.
Most of the problems on the internet in 2025 aren't because of one particular technology. They're because the modern web was based on gentleman's agreements and handshakes, and since those things have now gotten in the way of exponential profit increases on behalf of a few Stanford dropouts, they're being ignored writ large.
CF being down wouldn't be nearly as big of a deal if their service wasn't one of the main ways to protect against LLM crawlers that blatantly ignore robots.txt and other long-established means to control automated extraction of web content. But, well, it is one of the main ways.
Would it be one of the main ways to protect against LLM web scraping if we investigated one of the LLM startups for what is arguably a violation of the Computer Fraud and Abuse Act, arrested their C-suite, and sent each member to a medium-security federal prison (I don't know, maybe Leavenworth?) for multiple years after a fair trial?
Probably not.
2 replies →
Yes.
Yes.
Yes
Yes, they are terrible and more a negative force than a positive one in every way imaginable. I would take no LLMs all day every day.
Unfortunately the problem isn't just "the internet sucks" it's "the internet sucks, and everyone uses it" - meaning people are not doing stuff offline, and a lot of our lives require us to be online.
The Internet is huming along beautifully
It is the Web that is being degraded