Comment by SparkyMcUnicorn

11 days ago

How can you tell the difference, in a way that can't be spoofed?

This is a genuine question, since I see you work at CF. I'm very curious what the distinction will be between a user and a crawler. Is trust involved, making spoofing a non-issue?

4 comments

SparkyMcUnicorn

kentonv 11 days ago

I don't personally work on bot detection, and I don't know exactly what techniques they use.

But if you think about it: crawlers are probably not hard to identify, as they systematically download your entire web site as well as every other site on the internet (a significant fraction of which is on Cloudflare). This traffic pattern is obviously going to look extremely different from a web browser operated by a human. Honestly, this is probably one of the easiest kinds of bots to detect.

SparkyMcUnicorn 10 days ago

Cross-site traffic was completely overlook in my previous comment. Good point, and definitely changes the perspective a bit.

jeroenhd 11 days ago

Not OP of course, but I think there's a clear way forward.

An LLM accessibility browser is a bot, so bot detection sounds like the wrong approach to me. What's more important than bot detection is "actual real user" detection, of which bot detection is only part.

If the control software runs on a user's local device, things like TPMs can offer a device-bound signature for remote attestation. Virtual TPMs don't have root certificates signed by TPM/CPU makers, so they're not useful for building trust. A CPU shared between hundreds of other VMs somewhere in a cloud will not be providing unique TPM verification so AI scrapers will have to switch their scraping to having botnets do the work rather than just using them as proxies, and even then they can't get away with hacked routers (that lack TPMs).

There's a huge downside to this, of course, and that's basically handing control over who gets to use the internet to a few TPM companies that can lock you out whenever they please. If there's any way to tie this remote attestation system to you as a person, this puts tremendous power in the hands of the US government (see what happened to the ICC judge investigating the genocide over in Gaza) as they can force American companies to banish you.

I don't think the internet should develop in this direction, but with CAPTCHA failing to block bots and with AI scrapers ruining the internet, I don't see things going any other way.

Now that Cloudflare is putting a monetary value to bypassing its blocks for shitty AI scrapers, you can bet that there will be an industry of underpaid IT workers figuring out how to bypass CF's bot detection for a competitive market rate.

kentonv 11 days ago

> An LLM accessibility browser is a bot
I don't agree with this. A browser, operated by a human user, is not a bot. Adding LLM-powered accessibility features to a browser does not make it a bot.