Comment by hombre_fatal
2 days ago
It depends what your goal is.
Having to use a browser to crawl your site will slow down naive crawlers at scale.
But it wouldn't do much against individuals typing "what is a kumquat" into their local LLM tool that issues 20 requests to answer the question. They're not really going to care nor notice if the tool had to use a playwright instance instead of curl.
Yet it's that use-case that is responsible for ~all of my AI bot traffic according to Cloudflare which is 30x the traffic of direct human users. In my case, being a forum, it made more sense to just block the traffic.
Maybe a stupid question but how can Cloudflare detect what portion of traffic is coming from LLM agents? Do agents identify themselves when they make requests? Are you just assuming that all playwright traffic originated from an agent?
That is what Cloudflare's bot metrics dashboard told me before I enabled their "Super Bot Fighter" system that brought traffic back down to its pre-bot levels.
I assume most traffic comes from hosted LLM chats (e.g. chatgpt.com) where the provider (e.g. OpenAI) is making the requests from their own servers.