← Back to context

Comment by jsheard

2 months ago

The bot having "Crawlspace" in its UA doesn't necessarily mean it honors "Crawlspace" directives in robots.txt. Would it bail out if it saw this robots.txt?

  User-agent: Crawlspace
  Disallow: /

> May I ask why you’d want to block it, even if it crawls respectfully?

The main audience for the product seems to be AI companies, and some people just aren't interested in feeding that beast. Lots of sites block Common Crawl even though their bot is usually polite.
