Comment by xena

7 months ago

I did actually try zip bombs at first. They didn't work due to the architecture of how Amazon's scraper works. It just made the requests get retried.

Amazon's scraper has been sending multiple requests per second to my servers for 6+ weeks, and every request has been returned 429.

Amazon's scraper doesn't back off. Meta, google, most of the others with identifiable user agents back off, Amazon doesn't.

So first, let me prefix this by saying I generally don't accept cookies from websites I don't explicitly first allow, my reasoning being "why am I granting disk read/write access to [mostly] shady actors to allow them to track me?"

(I don't think your blog qualifies as shady … but you're not in my allowlist, either.)

So if I visit https://anubis.techaro.lol/ (from the "Anubis" link), I get an infinite anime cat girl refresh loop — which honestly isn't the worst thing ever?

But if I go to https://xeiaso.net/blog/2025/anubis/ and click "To test Anubis, click here." … that one loads just fine.

Neither xeserv.us nor techaro.lol are in my allowlist. Curious that one seems to pass. IDK.

The blog post does have that lovely graph … but I suspect I'll loop around the "no cookie" loop in it, so the infinite cat girls are somewhat expected.

I was working on an extension that would store cookies very ephemerally for the more malicious instances of this, but I think its design would work here too. (In-RAM cookie jar, burns them after, say, 30s. Persisted long enough to load the page.)

  • You're seeing an experiment in progress. It seems to be working, but I have yet to get enough data to know if it's ultimately successful or not.

  • Just FYI temporary containers (Firefox extension) seem to be the solution you're looking for. It essentially generates a new container for every tab you open (subtabs can be either new containers or in the same container). Once the tab is closed it destroys the container and deletes all browsing data (including cookies). You can still whitelist some domains to specific persistent containers.

    I used cookie blockers for a long time, but always ended up having to whitelist some sites even though I didn't want their cookies because the site would misbehave without them. Now I just stopped worrying.

  • > Neither xeserv.us nor techaro.lol are in my allowlist. Curious that one seems to pass. IDK.

    Is your browser passing a referrer?

Did you also try Transfer-Encoding: chunked and things like HTTP smuggling to serve different content to web browser instances than to scrapers?