← Back to context

Comment by maelito

2 days ago

I'm having lots of connections every day from Singapor. It's now the main country... despite the whole website being French-only. AI crawlers, for sure.

Thanks for this tip.

Amazonbot does this despite my efforts in robots.txt to help it out. I look at all the Singapore requests and they’re Amazonbot trying to get various variants of the Special:RecentChanges page. You’re wasting your time, Amazonbot. I’m trying to help you.

  • Did you check IP address of this UA?

    • Yeah, a while ago, they're all Singapore reporting Amazonbot. Here is an example request:

          "GET /w/A_Wedding_at_City_Hall HTTP/1.1" 200 9677 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36"
      

      The actual IP is in X-Forwarded-For and I didn't keep that.

      3 replies →

Fun fact: you don't get rid of them even when you put a captcha on all visitors from Singapore. I still see a spike in traffic that perfectly matches the spike in served captchas, but this time it's geographically distributed between places like Iraq, Bangladesh and Brazil.

Hopefully it at least costs them a little bit more.

  • Usually, there are multiple layers of different counter-protection measures. If you block by country, they shift to different IP ranges, if you block by IP, they might use a new IP for every request, and escalate further depending on the bot owner and your actions.

Yeah same for my Gitea instance. These were all ByteDance and Tencent ASNs from some AWS-equivalent. Blocked the whole subnet belonging to them in my server's ufw and haven't had any problems since then. Same for Vultr and Google Cloud.