← Back to context

Comment by spiderfarmer

5 days ago

My platform has 24M pages on 8 domains and these NASTY crawlers insist on visiting every single one of them. For every 1 real visitor there are at least 300 requests from residential proxies. And that's after I blocked complete countries like Russia, China, Taiwan and Singapore.

Even Cloudflares bot filter only blocks some of them.

I'm using honeypot URLs right now to block all crawlers that ignore rel="nofollow", but they appear to have many millions of devices. I wouldn't be surprised if there are a gazillion residential routers, webcams and phones that are hacked to function as a simple doorways.

Things are really getting out of hand.

Have you considered recaptcha v2 and similar? Proof of work might slow them down. Sounds pretty bad. Would be great if Cloudflare, Datadome, etc. were doing this for you and thus banning these devices for everyone.

What crawlers are using residential proxies?

  • Now if they identified themselves, I could block them.

    I'd put my money on Chinese AI model makers, but I don't trust any company that is in desperate need of fresh data.