Comment by mschuster91
21 hours ago
Global tarpit is the solution. It makes sense anyway even without taking AI crawlers into account. Back when I had to implement that, I went the semi manual route - parse the access log and any IP address averaging more than X hits a second on /api gets a -j TARPIT with iptables [1].
Not sure how to implement it in the cloud though, never had the need for that there yet.
[1] https://gist.github.com/flaviovs/103a0dbf62c67ff371ff75fc62f...
One such tarpit (Nepenthes) was just recently mentioned on Hacker News: https://web.archive.org/web/20250117030633/https://zadzmo.or...
Quixotic[0] (my content obfuscator) includes a tarpit component, but for something like this, I think the main quixotic tool would be better - you run it against your content once, and it generates a pre-obfuscated version of it. It takes a lot less of your resources to serve than dynamically generating the tarpit links and content.
0 - https://marcusb.org/hacks/quixotic.html
How do you know their site is down? You probably just hit their tarpit. :)
i would think public outcry by influencers on social media (such as this thread) is a better deterrent, and also establishes a public datapoint and exhibit for future reference.. as it is hard to scale the tarpit.
This doesn't work with the kind of highly distributed crawling that is the problem now.
Don't we have intellectual property law for this tho?