Comment by Aurornis
15 hours ago
I don’t think I’d assume this is actually Amazon. The author is seeing requests from rotating residential IPs and changing user agent strings
> It's futile to block AI crawler bots because they lie, change their user agent, use residential IP addresses as proxies, and more.
Impersonating crawlers from big companies is a common technique for people trying to blend in. The fact that requests are coming from residential IPs is a big red flag that something else is going on.
I work for Amazon, but not directly on web crawling.
Based on the internal information I have been able to gather, it is highly unlikely this is actually Amazon. Amazonbot is supposed to respect robots.txt and should always come from an Amazon-owned IP address (You can see verification steps here: https://developer.amazon.com/en/amazonbot).
I've forwarded this internally just in case there is some crazy internal team I'm not aware of pulling this stunt, but I would strongly suggest the author treats this traffic as malicious and lying about its user agent.
Randomly selected IPs from my logs show that 80% of them have the matching that forward confirming reverse DNS domain. The most aggressive ones were from the amazonbot domain.
Believe what you want though. Search for `xeiaso.net` in ticketing if you want proof.
So you said the IPs are residential IP, but their reverse DNS points to a amazonbot domain? Does that even make sense?
Reverse DNS doesn't mean much, they can set it to anything; can you forward match them to any amazon domain?
1 reply →
> The author is seeing requests from rotating residential IPs and changing user agent strings
This type of thing is commercially available as a service[1]. Hundreds of Millions of networks backdoored and used as crawlers/scrapers because of an included library somewhere -- and ostensibly legal because somewhere in some ToS they had some generic line that could plausibly be extended to using you as a patsy for quasi-legal activities.
[1] https://brightdata.com/proxy-types/residential-proxies
Yes, we know, but the accusation is that Amazon is the source of the traffic.
If the traffic is coming from residential IPs then it’s most likely someone using these services and putting “AmazonBot” as a user agent to trick people.
I wouldn't put it past any company these days doing crawling in an aggressive manner to use proxy networks.
With the amount of "if cloud IP then block" rules in place for many things (to weed out streaming VPNs and "potential" ddos-ing) I wouldn't doubt that at all.