← Back to context

Comment by jasonjayr

1 year ago

Sure; but sensible defaults ought to be in place. There are certain "well known" urls that are intended for machine consuption. CF should permit (and perhaps rate limit?) those by default, unless the user overrides them.

Putting a CAPTCHA in front of robots.txt in particular is harmful. If a web crawler fetches robots.txt and receives an HTML response that isn’t a valid robots.txt file, then it will continue to crawl the website when the real robots.txt might’ve forbidden it from doing so.