Comment by lmm
2 days ago
The kind of crawlers/scrapers who DDoS a site like this aren't going to bother checking common crawl or tarballs. You vastly overestimate the intelligence and prosociality of what bursty crawler requests tend to look like. (Anyone who is smart or prosocial will set up their crawler to not overwhelm a site with requests in the first place - yet any site with any kind of popularity gets flooded with these requests sooner or later)
If they don’t have the intelligence to go after the more efficient data collection method then they likely won’t have the intelligence or willpower to work around the second part I mentioned (keeping something like Anubis). The only problem is when you put Anubis in the way of determined, intelligent crawlers without giving them a choice that doesn’t involve breaking Anubis.