Comment by andrethegiant

2 months ago

Thanks for the feedback, it’s mentioned in the platform FAQ but I should make it more prominent in the docs. The UA will always be prefixed with the string `Crawlspace`. May I ask why you’d want to block it, even if it crawls respectfully?

The bot having "Crawlspace" in its UA doesn't necessarily mean it honors "Crawlspace" directives in robots.txt. Would it bail out if it saw this robots.txt?

  User-agent: Crawlspace
  Disallow: /

> May I ask why you’d want to block it, even if it crawls respectfully?

The main audience for the product seems to be AI companies, and some people just aren't interested in feeding that beast. Lots of sites block Common Crawl even though their bot is usually polite.

https://originality.ai/ai-bot-blocking