Comment by Hackbraten

5 hours ago

Cloudflare: let's give the bots their own accounts so they can scrape harder.

Also Cloudflare: let's send normal humans who are trying to go about their daily lives into endless Turnstile spinner loops with absolutely zero recourse, grievance, or support infrastructure.

Just had 10 rounds of busses, motorcycles and fire hydrants with Google before I decided I don't actually want to see that page so much. So Cloudflare is unfortunately not the only offender here.

I had similar thoughts: "let's convince everyone to outsource the decision on who can access their websites to us, because BOTS BOTS BOTS" and "let's make life easier for bots to do things".

I’d love a chrome extension to measure just how many times per day I’m met with the Turnstile.

Turnstiles per minute.

If all bots are subject to a rate limit, then the system works as designed. Especially if site operators can block bot accounts. Requiring accounts is one of the easiest solutions for that problem. One of the large issues with scrapers is that they pretend to be normal internet visitors that never visited your site before, because any bot that stored cookies would immediately be rate limited by basic config.

Turnstile isn't something Cloudflare put up to annoy you. It's what the website owners decided to put up, for many different reasons.

In the same vein, Anubis has a default configuration that lets honest scrapers and crawlers through, because those can easily be rejected by basic web server configurations. Only scrapers pretending to be browsers need to solve the proof-of-work puzzle. You can disable that feature, of course.

Cloudflare may play this smart: force bots to pay for access, then take 30% of the cut and give the rest to the website owners. That way, websites get paid when the AI slop machine digests their content. Normal visitors get in for free, turn the scraper hellscape into a sustainable model. Bonus points for letting websites set their own rates (pre-declared to scrapers, of course) to dissuade all but the most interested scrapers.

Are you still reading webpages personally instead telling your AI to do it?

  • Maybe your agent could have a little blog where it keeps a diary of cool pages it read for you? And then you subscribe to that?

    • Interesting idea ... doesn't fit my style of content consumption, but it might fit yours.

      I went for a personal newsfeed, agent pulls news form ~100 feeds related to my interests. Then reads all articles for me and orders them by how interesting they might be for me. I specifically asked for vector embeddings, up/down votes (-2..+2), visited status, LLM content evaluation. Probably there are some other mechanisms I didn't even bother to check. It's a work in progress but I can see myself replacing most of my news reading with it. For many news the AI summary, which contains main idea behind the item is enough for me. As a bonus it resolves clickbait and is quite good at it. Also no ads, ever. For sure I need to implement some grouping because when the popular story breaks I have many stories about the same thing with mostly overlapping details. AI merging them would be quite cool.

      I also asked AI to extract my interests from my browsing/watching histories of my all accounts. V2 of my newsfeed might utilize that somehow for better results.

      Silly thing is I made it in one afternoon with my only motivation of being slightly more annoyed with the web on that day.

I'm sure they don't want humans to have that experience - the issue is that the human behavior looks very bot-like. This is usually only experienced by people whose setup is peculiar