Speaking from the scraper’s perspective, I like proof of work; a ten year old 96-core server will cost a couple of quid to run for a few hours and will grab an absurd number of pages thanks to the access granted by repeatedly solving proofs of work. Small slick codebases too!
There's also the Anubis idea where your PoW is persistent until your IP address or session cookie changes, so you get to skip PoW in exchange for making yourself identifiable, which means the PoW can then be ramped up to take a couple of minutes.
I don't use Anubis though. I just make my site not take five seconds to render a page so bots can overload it easily? It's not actually that hard?
I think we're talking about 2 different things. PoW is annoying for basic scrapers but it really doesn't affect enterprise grade bot operations with access to unlimited residential proxies.
Depends on what type of scraping you're trying to stop. For the dumb scrapers that would try to scrape every page on a git forge (for which there are a bazillion pages for a modest project, because of how the site works), yeah it might deter them enough to stop. For anything high value (eg. reddit comments or retail prices), 10s of cpu time isn't going to stop them.
Speaking from the scraper’s perspective, I like proof of work; a ten year old 96-core server will cost a couple of quid to run for a few hours and will grab an absurd number of pages thanks to the access granted by repeatedly solving proofs of work. Small slick codebases too!
There's also the Anubis idea where your PoW is persistent until your IP address or session cookie changes, so you get to skip PoW in exchange for making yourself identifiable, which means the PoW can then be ramped up to take a couple of minutes.
I don't use Anubis though. I just make my site not take five seconds to render a page so bots can overload it easily? It's not actually that hard?
It would be more profitable to mine bitcoin.
PoW doesn't stop bots.. It's an annoyance at most. A rate limiter and nothing more
PoW difficulty can be scaled, eg: all cookies must work 1s, but 2nd cookie from the same ip, might have to do 2s of work
ideally one would pick something a bit more forgiving than a linear function, to avoid penalizing too much users connecting from CGNAT
I think we're talking about 2 different things. PoW is annoying for basic scrapers but it really doesn't affect enterprise grade bot operations with access to unlimited residential proxies.
1 reply →
Can this be repurposed as some kind of distributed cryptocurrency mining mechanism? Pay websites by mining some monero in order to access them?
How does proof of work stop bots?
Because it destroys the economics of scraping. It’s too expensive with proof of work, or at least not as economically viable
Depends on what type of scraping you're trying to stop. For the dumb scrapers that would try to scrape every page on a git forge (for which there are a bazillion pages for a modest project, because of how the site works), yeah it might deter them enough to stop. For anything high value (eg. reddit comments or retail prices), 10s of cpu time isn't going to stop them.
10 replies →
5W load for 2 seconds is 0.002Wh, I think we'll be fine
Except it doesn't
1 reply →
If it gets too expensive/time-consuming to scrape then it won't happen at scale (as much)?