Comment by jsheard
2 days ago
For the "good" bots which at least respect robots.txt you can use this list to get ahead of them before they pummel your site.
https://github.com/ai-robots-txt/ai.robots.txt
There's no easy solution for bad bots which ignore robots.txt and spoof their UA though.
Such as OpenAI, who will ignore robots.txt and change their user agent to evade blocks, apparently[1]
1: https://www.reddit.com/r/selfhosted/comments/1i154h7/openai_...
For those looking, this is the best I've found: https://blog.cloudflare.com/declaring-your-aindependence-blo...
This seemed to work for some time when it came out but IME no longer does.
Thanks, will look into that!