Comment by pilif

3 months ago

> This was obviously dumb when it launched:

Yes. Obviously dumb but also nearly 100% successful at the current point in time.

And likely going to stay successful as the non-protected internet still provides enough information to dumb crawlers that it’s not financially worth it to even vibe-code a workaround.

Or in other words: Anubis may be dumb, but the average crawler that completely exhausting some sites resources is even dumber.

And so it all works out.

And so the question remains: how dumb was it exactly, when it works so well and continues to work so well?

9 comments

pilif

account42 3 months ago

> Yes. Obviously dumb but also nearly 100% successful at the current point in time.

Only if you don't care about negatively affecting real users.

pilif 3 months ago

I understand this as an argument that it’s better to be down for everyone than have a minority of users switch browsers.
I’m not convinced by that makes sense.
Now ideally you would have the resources to serve all users and all the AI bots without performance degradation, but for some projects that’s not feasible.
In the end it’s all a compromise.

kldg 3 months ago

does it work well? I run chromium controlled by playwright for scraping and typically make Gemini implement the script for it because it's not worth my time otherwise. -but I'm not crawling the Internet generally (which I think there is very little financial incentive to do; it's a very expensive process even ignoring Anubis et al); it's always that I want something specific and am sufficiently annoyed by lack of API.

regarding authentication mentioned elsewhere, passing cookies is no big deal.

eaglefield 3 months ago

Anubis is not meant to stop single endpoints from scraping. It's meant to make it harder for massive AI scrapers. The problematic ones evade rate limiting by using many different ip addresses, and make scraping cheaper on themselves by running headless. Anubis is specifically built to make that kind of scraping harder as i understand it.

bananalychee 3 months ago

Does it actually? I don't think I've seen a case study with hard numbers.

pilif 3 months ago
Here’s one study
https://dukespace.lib.duke.edu/server/api/core/bitstreams/81...
And of all the high-profile projects implementing it, like the LKML archives, none have backed down yet, so I’m assuming the initial improvement in numbers must continue or it would have been removed since
- Tmpod 3 months ago
  
  I run a service under the protection of go-away[0], which is similar to Anubis, and can attest it works very well, still. Went from constant outages due to ridiculous volumes of requests to good load times for real users and no bad crawlers coming through.
  [0]: https://git.gammaspectra.live/git/go-away
- bananalychee 3 months ago
  
  Great, thanks for the link.

snickerdoodle12 3 months ago

the workaround is literally just running a headless browser, and that's pretty much the default nowadays.

if you want to save some $$$ you can spend like 30 minutes making a cracker like in the article. just make it multi threaded, add a queue and boom, your scraper nodes can go back to their cheap configuration. or since these are AI orgs we're talking about, write a gpu cracker and laugh as it solves challenges far faster than any user could.

custom solutions aren't worth it for individual sites, but with how widespread anubis is it's become worth it.