Comment by zlagen

1 year ago

I'm using chrome on linux and noticed that this year cloudflare is very agressive in showing the "Verify you are a human" box. Now a lot of sites that use cloudflare show it and once you solve the challenge it shows it again after 30 minutes!

What are you protecting cloudflare?

Also they show those captchas when going to robots.txt... unbelievable.

68 comments

zlagen

rurp 1 year ago

Cloudflare has been even worse for me on Linux + Firefox. On a number of sites I get the "Verify" challenge and after solving it immediately get a message saying "You have been blocked" every time. Clearing cookies, disabling UBO, and other changes make no difference. Reporting the issue to them does nothing.

This hostility to normal browsing behavior makes me extremely reluctant to ever use Cloudflare on any projects.

a_imho 1 year ago

I'm a Cloudflare customer, even their own dashboard does not work with linux+slightly older firefox. I mean one click and it is ooops, please report the error to dev null
mmh0000 1 year ago
At least you can get past the challenge. For me, every-single-time it is an endless loop of "select all bikes/cars/trains". I've given up even trying to solve the challenge anymore and just close the page when it shows up.
- theamk 1 year ago
  
  that's not Cloudflare, they stopped doing pictures years ago. You can tell because Cloudflare always puths their brand name on their page.
  Cloudflare just blocks you without recourse nowdays.
  
  3 replies →
Springtime 1 year ago
I run a few Linux desktop VMs and Cloudflare's Turnstile verification (their auto/non-input based verification) fails for the couple sites I've tried that use it for logins, on latest Chromium and Firefox browsers. Doesn't matter that I'm even connecting from the same IP.
I'd presumed it was just the VM they're heuristically detecting but sounds like some are experiencing issues on Linux in general.
- abirch 1 year ago
  
  I guess it’s time to update our user agent strings like I did with konquerer 20 years ago.
  Looks like there’s a plugin for that https://chromewebstore.google.com/detail/user-agent-switcher...
nbernard 1 year ago

Check that you are allowing webworker scripts, that did the trick for me. I still have issues on slower computers (Raspberry pies and the like) as they seem to be to slow to do whatever Cloudflare wants as a verification in the allotted time, however.
ponector 1 year ago

Sounds like my experience browsing internet while connected to the VPN provided by my employer: tons of captcha and everything is defaulted to German (IP is from Frankfurt).
ranger_danger 1 year ago

The problem is that you are not performing "normal browsing behavior". The vast majority of the population (at least ~70% don't use ad-blockers) have no extensions and change no settings, so they are 100% fingerprintable every time, which lets them through immediately.
globalnode 1 year ago

linux + firefox. not sure what happened to me yesterday but the challange/response thing was borked and when i finally got through it all, it said i was a robot anyway. this was while trying to sign up for a skype acct, could have been a ms issue though and not necessarily cloudflare. i think the solution is to just not use obstructive software. thanks to this issue i discovered jitsi and that seems more than enough for my purposes.
sleepybrett 1 year ago

Yeah, Lego and Etsy are two sites I can now only visit with safari. It sucks. Firefox on the same machine it claims I'm a bot or a crawler. (not even on linux, on a mac)
neodymiumphish 1 year ago
Does it still apply if you change the UA to something more common (Chrome on Windows or something)?
- beepbooptheory 1 year ago
  
  Fwiw, I was getting cloudflare blocked for a long time on Firefox+Linux and the only thing that fixed it was completely disabling the UA adjuster browser extension I had installed.
lta 1 year ago

Yeah, same here. I've avoided it for a most of my customers for that very reason already
anothercoup 1 year ago

[flagged]

fcq 1 year ago

I have Firefox and Brave set to always clear cookies and everything when I close the browser... it is a nightmare when I come back the amount of captchas everywhere....

It is either that or keep sending data back to the Meta and Co. overlords despite me not being a Facebook, Instagram, Whatsapp user...

ezfe 1 year ago
You don't need to clear cookies to avoid sending that data back. Just use a browser that properly isolates third party/Facebook cookies.
- nacs 1 year ago
  
  You don't even need to use a different browser - Firefox has an official "Multi-account containers" extension that lets you assign certain sites to open in their own sandbox so you can have a sandbox for Google, another for Facebook, etc.
  
  2 replies →
ATechGuy 1 year ago

I wonder if browsers have a future.

nerdralph 1 year ago

I don't bother with sites that have cloudflare turnstyle. Web developers supposedly know the importance of page load time, but even worse than a slow loading page is waiting for cloudflare's gatekeeper before I can even see the page.

fbrchps 1 year ago
That's not turnstile, that's a Managed Challenge.
Turnstile is the in-page captcha option, which you're right, does affect page load. But they force a defer on the loading of that JS as best they can.
Also, turnstile is a Proof of Work check, and is meant to slow down & verify would-be attack vectors. Turnstile should only be used on things like Login, email change, "place order", etc.
- supriyo-biswas 1 year ago
  
  Managed challenges actually come from the same "challenges" platform, which includes Turnstile; the only difference being that Turnstile is something that you can embed yourself on a webpage, and managed challenge is Cloudflare serving the same "challenge" on an interstitial web page.
  Also, Turnstile is definitely not a simple proof of work check, and performs browser fingerprinting and checks for web APIs. You can easily check this by changing your browser's user-agent at the header level and leave it as-is at the header level; this puts Turnstile into an infinite loop.

viraptor 1 year ago

The captcha on robots is a misconfiguration in the website. CF has lots of issues, but this one is on their costumer. Also they detect Google and other bots, so those may be going through anyway.

jasonjayr 1 year ago
Sure; but sensible defaults ought to be in place. There are certain "well known" urls that are intended for machine consuption. CF should permit (and perhaps rate limit?) those by default, unless the user overrides them.
- JimDabell 1 year ago
  
  Putting a CAPTCHA in front of robots.txt in particular is harmful. If a web crawler fetches robots.txt and receives an HTML response that isn’t a valid robots.txt file, then it will continue to crawl the website when the real robots.txt might’ve forbidden it from doing so.

potus_kushner 1 year ago

using palemoon, i don't even get a captcha that i could solve. just a spinning wheel, and the site reloads over and over. this makes it impossible to use e.g. anything hosted on sourceforge.net, as they're behind the clownflare "Great Firewall of the West" too.

inemesitaffia 1 year ago

See if changing user agent to Chrome/Firefox helps

progmetaldev 1 year ago

Whoever configures the Cloudflare rules should be turning off the firewall for things like robots.txt and sitemap.xml. You can still use caching for those resources to prevent them becoming a front door to DDoS.

kevincox 1 year ago
It seems like common cases like this should be handled correctly by default. These are cachable requests intended for robots. Sure, it would be nice if webmasters configure it but I suspect a tiny minority does.
For example even Cloudflare hasn't configure their official blog's RSS feed properly. My feed reader (running in a DigitalOcean datacenter) hasn't been able to access it since 2021 (403 every time even though backed off to checking weekly). This is a cachable endpoint with public data intended for robots. If they can't configure their own product correctly for their official blog how can they expect other sites to?
- progmetaldev 1 year ago
  
  I agree, but I also somewhat understand. Some people will actually pay more per month for Cloudflare than their own hosting. The Cloudflare Pro plan is $20/month USD. Some sites wouldn't be able to handle the constant requests for robots.txt, just because bots don't necessarily respect cache headers (if they are even configured for robots.txt), and the sheer number of bots that look at robots.txt and will ignore a caching header are too numerous.
  If you are writing some kind of malicious crawler that doesn't care about rate-limiting, and wants to scan as many sites as possible for the most vulnerable to get a list together to hack, you will scan robots.txt because that is the file that tells robots NOT to index these pages. I never use a robots.txt for some kind of security through obscurity. I've only ever bothered with robots.txt to make SEO easier when you can control a virtual subdirectory of a site, to block things like repeated content with alternative layouts (to avoid duplicate content issues), or to get a section of a website to drop out of SERPs for discontinued sections of a site.
  
  2 replies →

glandium 1 year ago

The best part is when you get the "box" on a XHR request. Of course no site handles that properly, and just breaks. Happens regularly on ChatGPT.

scarab92 1 year ago

Cloudflare is security theatre.

I scrape hundreds of cloudflare protected sites every 15 minutes, without ever having any issues, using a simple headless browser and mobile connection, meanwhile real users get interstitial pages.

It's almost like Cloudflare is deliberately showing the challenge to real users just to show that they exist and are doing "something".

chiefalchemist 1 year ago

Just wanted to mention that the time between challenges is set by the site, not CF. Perhaps if you mention it, the site(s) will update the setting?

kylecazar 1 year ago

Same. I'm consistently getting a captcha and some nonsense about a Ray ID multiple times a day.

trinix912 1 year ago

It's not just Linux, I'm using Chrome on my macOS Catalina MBP and I can't even get past the "Verify you are a human" box. It just shows another captcha, and another, and yet another... No amount of clearing cookies/disabling adblockers/connecting from a different WiFi does it. And that's on most random sites (like ones from HN links), I also don't recall ever doing anything "suspicious" (web scraping etc.) on that device/IP.

Somehow, Safari passes it the first time. WTF?

idlephysicist 1 year ago

> What are you protecting cloudflare?

A cheeky response is "their profit margins", but I don't think that quite right considering that their earnings per share is $-0.28.

I've not looked into Cloudflare much, I've never needed their services, so I'm not totally sure on what all their revenue streams are. I have heard that small websites are not paying much if anything at all [1]. With that preface out of the way–I think that we see challenges on sites that perhaps don't need them as a form of advertising, to ensure that their name is ever-present. Maybe they don't need this form of advertising, or maybe they do.

[1] https://www.cloudflare.com/en-gb/plans/

tempest_ 1 year ago

If you log in to the CF dashboard every 3 months or so you will see pretty clearly they are slowly trying to be a cloud provider like Azure or AWS. Every time I log in there is a who new slew of services that have equivalent on the other cloud providers. They are using the CDN portion of the business as a loss leader.

benbristow 1 year ago

They usually protect the whole DNS record so it makes sense it would cover robots.txt as well, even if it's a bit silly.

alexjplant 1 year ago

They run their own DNS infra so that when you set the SOA for your zone to their servers they can decide what to resolve to. If you have protection set on a specific record then it resolves to a fleet of nginx servers with a bunch of special sauce that does the reverse proxying that allows for WAF, caching, anti-DDoS, etc. It's entirely feasible for them to exempt specific requests like this one since they aren't "protect[ing] the whole DNS" so much as using it to facilitate control of the entire HTTP request/response.

likeabatterycar 1 year ago

I run a honeypot and I can say with reasonable confidence many (most?) bots and scrapers use a Chrome on Linux user-agent. It's a fairly good indication of malicious traffic. In fact I would say it probably outweighs legitimate traffic with that user agent.

It's also a pretty safe assumption that Cloudflare is not run by morons, and they have access to more data than we do, by virtue of being the strip club bouncer for half the Internet.

rurp 1 year ago
User-agent might be a useful signal but treating it as an absolute flag is sloppy. For one thing it's trivial for malicious actors to change their user-agent. Cloudflare could use many other signals to drastically cut down on false positives that block normal users, but it seems like they don't care enough to be bothered. If they cared more about technical and privacy-conscious users they would do better.
- likeabatterycar 1 year ago
  
  > For one thing it's trivial for malicious actors to change their user-agent.
  Absolutely true. But the programmers of these bots are lazy and often don't. So if Cloudflare has access to other data that can positively identify bots, and there is a high correlation with a particular user agent, well then it's a good first-pass indication despite collateral damage from false positives.
  
  6 replies →
- sleepybrett 1 year ago
  
  I mean, do we need to replace user agent with some kind of 'browser signing'?
  
  1 reply →
lta 1 year ago
Sure, but does that means that we, Linux users, can't go on the web anymore ? It's way easier for spammers and bots to move to another user agent/system than for legitimate users. So whatever causes this is not a great solution to this problem. You can do better CF
- zamadatix 1 year ago
  
  I'm a Linux user as well but I'm not sure what Cloudflare is supposed to be doing here that makes everybody happy. Removing the most obvious signals of botting because there are some real users that look like that too may be better for that individual user but that doesn't make it a good answer for legitimate users as a whole. SPAM, DoS, phishing, credential stuffing, scraping, click fraud, API abuse, and more are problems which impact real users just as extra checks and false positive blocks do.
  If you really do have a better way to make all legitimate users of sites happy with bot protections then by all means there is a massive market for this. Unfortunately you're probably more like me, stuck between a rock and a hard place of being in a situation where we have no good solution and just annoyance with the way things are.
  
  2 replies →
johnklos 1 year ago
Many / most bots use Chrome on Linux user agent, so you think it's OK to block Chrome on Linux user agents. That's very broken thinking.
So it's OK for them to do shitty things without explaining themselves because they "have access to more data than we do"? Big companies can be mysterious and non-transparent because they're big?
What a take!
- wakawaka28 1 year ago
  
  Can't the user agent be spoofed anyway?
  
  1 reply →

shwouchk 1 year ago

I usually notice an increase in those when connecting to sites over vpn and especially tor. could that be it?

selfhoster 1 year ago

We're on Chrome on Linux, mostly we don't see those.

GGByron 1 year ago

Excuse my ignorance, but what exactly are these stupid checkboxes supposed to accomplish? Surely they do not represent a serious obstacle.