Comment by viraptor
1 year ago
The captcha on robots is a misconfiguration in the website. CF has lots of issues, but this one is on their costumer. Also they detect Google and other bots, so those may be going through anyway.
1 year ago
The captcha on robots is a misconfiguration in the website. CF has lots of issues, but this one is on their costumer. Also they detect Google and other bots, so those may be going through anyway.
Sure; but sensible defaults ought to be in place. There are certain "well known" urls that are intended for machine consuption. CF should permit (and perhaps rate limit?) those by default, unless the user overrides them.
Putting a CAPTCHA in front of robots.txt in particular is harmful. If a web crawler fetches robots.txt and receives an HTML response that isn’t a valid robots.txt file, then it will continue to crawl the website when the real robots.txt might’ve forbidden it from doing so.