Comment by andrethegiant
2 months ago
Thanks for the feedback, it’s mentioned in the platform FAQ but I should make it more prominent in the docs. The UA will always be prefixed with the string `Crawlspace`. May I ask why you’d want to block it, even if it crawls respectfully?
The bot having "Crawlspace" in its UA doesn't necessarily mean it honors "Crawlspace" directives in robots.txt. Would it bail out if it saw this robots.txt?
> May I ask why you’d want to block it, even if it crawls respectfully?
The main audience for the product seems to be AI companies, and some people just aren't interested in feeding that beast. Lots of sites block Common Crawl even though their bot is usually polite.
https://originality.ai/ai-bot-blocking
> Would it bail out if it saw this robots.txt?
Yes, it should — I use the library below, and it should split at the slash character, treating it as a prefix match, per spec.
https://github.com/samclarke/robots-parser/blob/master/Robot...