Comment by FloatArtifact
11 days ago
What about if somebody uses artificial intelligence crawler to help them navigate the web as an accessibility tool?
Enabling UI automation. It already throws up a lot of... uh... troublesome verifications.
11 days ago
What about if somebody uses artificial intelligence crawler to help them navigate the web as an accessibility tool?
Enabling UI automation. It already throws up a lot of... uh... troublesome verifications.
The site owner can allow such crawlers. There is the issue of bad actors pretending to be these types of crawlers but that could already happen to a site that want to allow google search crawlers but not gemini training data crawlers for example, so theres strong support to solve that problem
How would an individual user use a "crawler" to navigate the web exactly? A browser that uses AI is not automatically a "crawler"... a "crawler" is something that mass harvests entire web sites to store for later processing...
How can you tell the difference, in a way that can't be spoofed?
This is a genuine question, since I see you work at CF. I'm very curious what the distinction will be between a user and a crawler. Is trust involved, making spoofing a non-issue?
I don't personally work on bot detection, and I don't know exactly what techniques they use.
But if you think about it: crawlers are probably not hard to identify, as they systematically download your entire web site as well as every other site on the internet (a significant fraction of which is on Cloudflare). This traffic pattern is obviously going to look extremely different from a web browser operated by a human. Honestly, this is probably one of the easiest kinds of bots to detect.
1 reply →
Not OP of course, but I think there's a clear way forward.
An LLM accessibility browser is a bot, so bot detection sounds like the wrong approach to me. What's more important than bot detection is "actual real user" detection, of which bot detection is only part.
If the control software runs on a user's local device, things like TPMs can offer a device-bound signature for remote attestation. Virtual TPMs don't have root certificates signed by TPM/CPU makers, so they're not useful for building trust. A CPU shared between hundreds of other VMs somewhere in a cloud will not be providing unique TPM verification so AI scrapers will have to switch their scraping to having botnets do the work rather than just using them as proxies, and even then they can't get away with hacked routers (that lack TPMs).
There's a huge downside to this, of course, and that's basically handing control over who gets to use the internet to a few TPM companies that can lock you out whenever they please. If there's any way to tie this remote attestation system to you as a person, this puts tremendous power in the hands of the US government (see what happened to the ICC judge investigating the genocide over in Gaza) as they can force American companies to banish you.
I don't think the internet should develop in this direction, but with CAPTCHA failing to block bots and with AI scrapers ruining the internet, I don't see things going any other way.
Now that Cloudflare is putting a monetary value to bypassing its blocks for shitty AI scrapers, you can bet that there will be an industry of underpaid IT workers figuring out how to bypass CF's bot detection for a competitive market rate.
1 reply →
We already have ARIA, which is far more deterministic and should already be present on all major sites. AI should not be used, or necessary, as an accessibility tool.
If site authors would actually use aria. Not everything is a div, italic text is not for spawning emoji… it’s not good for semantic content or aria right now. It should not be necessary, but it is.
There's plenty of people who don't bother with ARIA and likely never will, so it's good to have tools that can attempt to help the user understand what's on screen. Though the scraping restrictions wouldn't be a problem in this scenario because the user's browser can be the one to pull down the page and then provide it to the AI for analysis.