Comment by davidclark

6 months ago

The OP author shows that the cost to scrape an Anubis site is essentially zero since it is a fairly simple PoW algorithm that the scraper can easily solve. It adds basically no compute time or cost for a crawler run out of a data center. How does that force rethinking?

3 comments

davidclark

Philpax 6 months ago

The cookie will be invalidated if shared between IPs, and it's my understanding that most Anubis deployments are paired with per-IP rate limits, which should reduce the amount of overall volume by limiting how many independent requests can be made at any given time.

That being said, I agree with you that there are ways around this for a dedicated adversary, and that it's unlikely to be a long-term solution as-is. My hope is that the act of having to circumvent Anubis at scale will prompt some introspection (do you really need to be rescraping every website constantly?), but that's hopeful thinking.

yborg 6 months ago

>do you really need to be rescraping every website constantly Yes, because if you believe you out-resource your competition, by doing this you deny them training material.

hooverd 6 months ago

The problem with crawlers if that they're functionally indistinguishable from your average malware botnet in behavior. If you saw a bunch of traffic from residential IPs using the same token that's a big tell.