← Back to context

Comment by userbinator

3 years ago

I run a MITM proxy for adblocking/general filtering and within the past little while I've noticed CloudFlare and other "bot protection" tends to get me blocked out of increasingly more sites I come across in search results, so this will be very useful for fixing that.

However, I should caution that in this era of companies being particularly user-hostile and authoritarian, especially Big Tech, I would be more careful with sharing stuff like this. Being forced to run JS is bad enough; profiling users based on other traits, and essentially determining if they are using "approved" software, is a dystopia we should fight strongly against. Stallman's Right To Read comes to mind as a very relevant warning story.

Cloudflare is likely one of the worst things that has happened to the internet in recent history.

Like, I get the need for some protective mechanisms for interactive content/posting/etc, but there should be zero cases where a simple HTTP 200 GET requires javascript/client side crap. If they serve me a slightly stale version of the remote resource (5 minutes/whatnot) that's fine.

They've effectively just turned into a google protection racket. Small/special purpose search/archive tools are just stonewalled.

  • You can't turn it off as a Cloudflare customer either.

    The best you've got is "essentially off" but that wording is such because even with everything disabled there are still edge cases where their security will enforce a JS challenge or CAPTCHA.

    • At least on their basic plan there is also little to no indication of how often this is triggering. Leading to having know idea what the various settings are.

  • Not to be too dismissive of this, but for companies trying to just run a service and getting constantly bombarded by stuff like DDoS issues, Cloudflare and its ilk lets them service a large portion of "legitimate" users, compared to none.

    I don't really know how you resolve that absent just like... putting everything behind logins, though.

  • > If they serve me a slightly stale version of the remote resource (5 minutes/whatnot) that's fine.

    Not all sites are configured to do this. Some pages are expensive to render and have no cache layer.

    • I get that, my point is it's the problem.

      They solve the DDOS issue by requiring JS captchas (which fundamentally breaks the way the internet should work), rather then serving a cache of the page to reduce load on the real host.

      Requiring JS doesn't disambiguate between well behaved automated (or headless. I used a custom proxy for a lot of my content browsing) user agents and malicious users, it breaks /all/ of them.

    • Some people shoot themselves in the foot, yes. There is no reason to not have some amount of microcaching even if it is very short and that puts an upper limit on the request rate per resource behind the caching layer.

I've noticed even GitHub has a login wall now for comments on open source projects. They truncate them if you aren't logged in, similar to reddit on mobile, instagram, twitter, etc. Hopefully the mobile version doesn't start pushing you to install some crappy apps where you can't use features like tabbed browsing, tab sync with another machine, etc.

  • The reasoning behind that might be the myriad of scrape-and-publish SEO spam pages with GitHub content.

    • Not sure if I am buying that excuse. I think they want to nudge people to make accounts and login. Really shady in case of Github and many other sides that are successful because of user content in my opinion.

      1 reply →

> profiling users based on other traits, and essentially determining if they are using "approved" software, is a dystopia we should fight strongly against. Stallman's Right To Read comes to mind as a very relevant warning story.

Right to Read indeed... fanfiction.net has over the last months become really annoying. Especially at night, when you have the FFN UI set to dark, and then out of nothing a bright white Cloudflare page appears. Or why the Cloudflare "anti bot" protection leads to an endless loop when the browser is the Android web view inside a third-party Reddit client.

Maybe I'm just a techno-optimist, but I suspect big tech companies don't give a hoot about you running "unapproved" software, but rather care about their services being abused and "unapproved" software is just a useful signal that fails on a tiny percentage of total legit users.

  • You are a lot more charitable than I am. I believe the big tech companies use dark patterns to get us to sign up, improve their metrics and hoover up our data.

  • Just trying to keep services operational is a fine goal to pursue as an operator, but forcing users to small inbound funnels for the service is detrimental too. There needs to be better research to be done to allow simpler ways of operation to continue working.

    A browser is becoming a universal agent by itself, but many people (maybe increasingly) use terminal to access to the resources, and stonewalling these paths are never OK in my book.