← Back to context

Comment by 1gn15

8 days ago

One solution is to not expose expensive endpoints in the first place. Serve everything statically, or use heavy caching.

> Precisely one reason comes to mind to have ROBOTS.TXT, and it is, incidentally, stupid - to prevent robots from triggering processes on the website that should not be run automatically. A dumb spider or crawler will hit every URL linked, and if a site allows users to activate a link that causes resource hogging or otherwise deletes/adds data, then a ROBOTS.TXT exclusion makes perfect sense while you fix your broken and idiotic configuration.

Source: https://wiki.archiveteam.org/index.php/Robots.txt

Several years ago, GitHub started moving certain features like "code search on public repos" behind login, likely due to issues like this, to be able to better enforce rate limits. And this was before the era of LLMs going wild.

(And it led to outrage from people for whom requiring an account was some kind of insult.)