Comment by zlagen
1 year ago
I'm using chrome on linux and noticed that this year cloudflare is very agressive in showing the "Verify you are a human" box. Now a lot of sites that use cloudflare show it and once you solve the challenge it shows it again after 30 minutes!
What are you protecting cloudflare?
Also they show those captchas when going to robots.txt... unbelievable.
Cloudflare has been even worse for me on Linux + Firefox. On a number of sites I get the "Verify" challenge and after solving it immediately get a message saying "You have been blocked" every time. Clearing cookies, disabling UBO, and other changes make no difference. Reporting the issue to them does nothing.
This hostility to normal browsing behavior makes me extremely reluctant to ever use Cloudflare on any projects.
I'm a Cloudflare customer, even their own dashboard does not work with linux+slightly older firefox. I mean one click and it is ooops, please report the error to dev null
At least you can get past the challenge. For me, every-single-time it is an endless loop of "select all bikes/cars/trains". I've given up even trying to solve the challenge anymore and just close the page when it shows up.
that's not Cloudflare, they stopped doing pictures years ago. You can tell because Cloudflare always puths their brand name on their page.
Cloudflare just blocks you without recourse nowdays.
3 replies →
I run a few Linux desktop VMs and Cloudflare's Turnstile verification (their auto/non-input based verification) fails for the couple sites I've tried that use it for logins, on latest Chromium and Firefox browsers. Doesn't matter that I'm even connecting from the same IP.
I'd presumed it was just the VM they're heuristically detecting but sounds like some are experiencing issues on Linux in general.
I guess it’s time to update our user agent strings like I did with konquerer 20 years ago.
Looks like there’s a plugin for that https://chromewebstore.google.com/detail/user-agent-switcher...
Check that you are allowing webworker scripts, that did the trick for me. I still have issues on slower computers (Raspberry pies and the like) as they seem to be to slow to do whatever Cloudflare wants as a verification in the allotted time, however.
Sounds like my experience browsing internet while connected to the VPN provided by my employer: tons of captcha and everything is defaulted to German (IP is from Frankfurt).
The problem is that you are not performing "normal browsing behavior". The vast majority of the population (at least ~70% don't use ad-blockers) have no extensions and change no settings, so they are 100% fingerprintable every time, which lets them through immediately.
linux + firefox. not sure what happened to me yesterday but the challange/response thing was borked and when i finally got through it all, it said i was a robot anyway. this was while trying to sign up for a skype acct, could have been a ms issue though and not necessarily cloudflare. i think the solution is to just not use obstructive software. thanks to this issue i discovered jitsi and that seems more than enough for my purposes.
Yeah, Lego and Etsy are two sites I can now only visit with safari. It sucks. Firefox on the same machine it claims I'm a bot or a crawler. (not even on linux, on a mac)
Does it still apply if you change the UA to something more common (Chrome on Windows or something)?
Fwiw, I was getting cloudflare blocked for a long time on Firefox+Linux and the only thing that fixed it was completely disabling the UA adjuster browser extension I had installed.
Yeah, same here. I've avoided it for a most of my customers for that very reason already
[flagged]
I have Firefox and Brave set to always clear cookies and everything when I close the browser... it is a nightmare when I come back the amount of captchas everywhere....
It is either that or keep sending data back to the Meta and Co. overlords despite me not being a Facebook, Instagram, Whatsapp user...
You don't need to clear cookies to avoid sending that data back. Just use a browser that properly isolates third party/Facebook cookies.
You don't even need to use a different browser - Firefox has an official "Multi-account containers" extension that lets you assign certain sites to open in their own sandbox so you can have a sandbox for Google, another for Facebook, etc.
2 replies →
I wonder if browsers have a future.
I don't bother with sites that have cloudflare turnstyle. Web developers supposedly know the importance of page load time, but even worse than a slow loading page is waiting for cloudflare's gatekeeper before I can even see the page.
That's not turnstile, that's a Managed Challenge.
Turnstile is the in-page captcha option, which you're right, does affect page load. But they force a defer on the loading of that JS as best they can.
Also, turnstile is a Proof of Work check, and is meant to slow down & verify would-be attack vectors. Turnstile should only be used on things like Login, email change, "place order", etc.
Managed challenges actually come from the same "challenges" platform, which includes Turnstile; the only difference being that Turnstile is something that you can embed yourself on a webpage, and managed challenge is Cloudflare serving the same "challenge" on an interstitial web page.
Also, Turnstile is definitely not a simple proof of work check, and performs browser fingerprinting and checks for web APIs. You can easily check this by changing your browser's user-agent at the header level and leave it as-is at the header level; this puts Turnstile into an infinite loop.
The captcha on robots is a misconfiguration in the website. CF has lots of issues, but this one is on their costumer. Also they detect Google and other bots, so those may be going through anyway.
Sure; but sensible defaults ought to be in place. There are certain "well known" urls that are intended for machine consuption. CF should permit (and perhaps rate limit?) those by default, unless the user overrides them.
Putting a CAPTCHA in front of robots.txt in particular is harmful. If a web crawler fetches robots.txt and receives an HTML response that isn’t a valid robots.txt file, then it will continue to crawl the website when the real robots.txt might’ve forbidden it from doing so.
using palemoon, i don't even get a captcha that i could solve. just a spinning wheel, and the site reloads over and over. this makes it impossible to use e.g. anything hosted on sourceforge.net, as they're behind the clownflare "Great Firewall of the West" too.
See if changing user agent to Chrome/Firefox helps
Whoever configures the Cloudflare rules should be turning off the firewall for things like robots.txt and sitemap.xml. You can still use caching for those resources to prevent them becoming a front door to DDoS.
It seems like common cases like this should be handled correctly by default. These are cachable requests intended for robots. Sure, it would be nice if webmasters configure it but I suspect a tiny minority does.
For example even Cloudflare hasn't configure their official blog's RSS feed properly. My feed reader (running in a DigitalOcean datacenter) hasn't been able to access it since 2021 (403 every time even though backed off to checking weekly). This is a cachable endpoint with public data intended for robots. If they can't configure their own product correctly for their official blog how can they expect other sites to?
I agree, but I also somewhat understand. Some people will actually pay more per month for Cloudflare than their own hosting. The Cloudflare Pro plan is $20/month USD. Some sites wouldn't be able to handle the constant requests for robots.txt, just because bots don't necessarily respect cache headers (if they are even configured for robots.txt), and the sheer number of bots that look at robots.txt and will ignore a caching header are too numerous.
If you are writing some kind of malicious crawler that doesn't care about rate-limiting, and wants to scan as many sites as possible for the most vulnerable to get a list together to hack, you will scan robots.txt because that is the file that tells robots NOT to index these pages. I never use a robots.txt for some kind of security through obscurity. I've only ever bothered with robots.txt to make SEO easier when you can control a virtual subdirectory of a site, to block things like repeated content with alternative layouts (to avoid duplicate content issues), or to get a section of a website to drop out of SERPs for discontinued sections of a site.
2 replies →
The best part is when you get the "box" on a XHR request. Of course no site handles that properly, and just breaks. Happens regularly on ChatGPT.
Cloudflare is security theatre.
I scrape hundreds of cloudflare protected sites every 15 minutes, without ever having any issues, using a simple headless browser and mobile connection, meanwhile real users get interstitial pages.
It's almost like Cloudflare is deliberately showing the challenge to real users just to show that they exist and are doing "something".
Just wanted to mention that the time between challenges is set by the site, not CF. Perhaps if you mention it, the site(s) will update the setting?
Same. I'm consistently getting a captcha and some nonsense about a Ray ID multiple times a day.
It's not just Linux, I'm using Chrome on my macOS Catalina MBP and I can't even get past the "Verify you are a human" box. It just shows another captcha, and another, and yet another... No amount of clearing cookies/disabling adblockers/connecting from a different WiFi does it. And that's on most random sites (like ones from HN links), I also don't recall ever doing anything "suspicious" (web scraping etc.) on that device/IP.
Somehow, Safari passes it the first time. WTF?
> What are you protecting cloudflare?
A cheeky response is "their profit margins", but I don't think that quite right considering that their earnings per share is $-0.28.
I've not looked into Cloudflare much, I've never needed their services, so I'm not totally sure on what all their revenue streams are. I have heard that small websites are not paying much if anything at all [1]. With that preface out of the way–I think that we see challenges on sites that perhaps don't need them as a form of advertising, to ensure that their name is ever-present. Maybe they don't need this form of advertising, or maybe they do.
[1] https://www.cloudflare.com/en-gb/plans/
If you log in to the CF dashboard every 3 months or so you will see pretty clearly they are slowly trying to be a cloud provider like Azure or AWS. Every time I log in there is a who new slew of services that have equivalent on the other cloud providers. They are using the CDN portion of the business as a loss leader.
They usually protect the whole DNS record so it makes sense it would cover robots.txt as well, even if it's a bit silly.
They run their own DNS infra so that when you set the SOA for your zone to their servers they can decide what to resolve to. If you have protection set on a specific record then it resolves to a fleet of nginx servers with a bunch of special sauce that does the reverse proxying that allows for WAF, caching, anti-DDoS, etc. It's entirely feasible for them to exempt specific requests like this one since they aren't "protect[ing] the whole DNS" so much as using it to facilitate control of the entire HTTP request/response.
I run a honeypot and I can say with reasonable confidence many (most?) bots and scrapers use a Chrome on Linux user-agent. It's a fairly good indication of malicious traffic. In fact I would say it probably outweighs legitimate traffic with that user agent.
It's also a pretty safe assumption that Cloudflare is not run by morons, and they have access to more data than we do, by virtue of being the strip club bouncer for half the Internet.
User-agent might be a useful signal but treating it as an absolute flag is sloppy. For one thing it's trivial for malicious actors to change their user-agent. Cloudflare could use many other signals to drastically cut down on false positives that block normal users, but it seems like they don't care enough to be bothered. If they cared more about technical and privacy-conscious users they would do better.
> For one thing it's trivial for malicious actors to change their user-agent.
Absolutely true. But the programmers of these bots are lazy and often don't. So if Cloudflare has access to other data that can positively identify bots, and there is a high correlation with a particular user agent, well then it's a good first-pass indication despite collateral damage from false positives.
6 replies →
I mean, do we need to replace user agent with some kind of 'browser signing'?
1 reply →
Sure, but does that means that we, Linux users, can't go on the web anymore ? It's way easier for spammers and bots to move to another user agent/system than for legitimate users. So whatever causes this is not a great solution to this problem. You can do better CF
I'm a Linux user as well but I'm not sure what Cloudflare is supposed to be doing here that makes everybody happy. Removing the most obvious signals of botting because there are some real users that look like that too may be better for that individual user but that doesn't make it a good answer for legitimate users as a whole. SPAM, DoS, phishing, credential stuffing, scraping, click fraud, API abuse, and more are problems which impact real users just as extra checks and false positive blocks do.
If you really do have a better way to make all legitimate users of sites happy with bot protections then by all means there is a massive market for this. Unfortunately you're probably more like me, stuck between a rock and a hard place of being in a situation where we have no good solution and just annoyance with the way things are.
2 replies →
Many / most bots use Chrome on Linux user agent, so you think it's OK to block Chrome on Linux user agents. That's very broken thinking.
So it's OK for them to do shitty things without explaining themselves because they "have access to more data than we do"? Big companies can be mysterious and non-transparent because they're big?
What a take!
Can't the user agent be spoofed anyway?
1 reply →
I usually notice an increase in those when connecting to sites over vpn and especially tor. could that be it?
We're on Chrome on Linux, mostly we don't see those.
Excuse my ignorance, but what exactly are these stupid checkboxes supposed to accomplish? Surely they do not represent a serious obstacle.