← Back to context

Comment by grishka

2 hours ago

In my particular case, I don't mind the crawling. It's a fediverse server. There is nothing secret there. All content is available via ActivityPub anyway for anyone to grab. However, these bots specifically violated both robots.txt and rel="nofollow" while hitting endpoints like "log in to like this post" pages tens of times per second. They were just wasting my server's resources for nothing.

The modern problem is naive content harvesting bots, for AI training. My advice is to make sure you have a very efficient code path for login pages. 10 pages per second is nothing if you don’t have to perform any database queries (because you don’t have any authentication token to validate).