Comment by jclulow
3 months ago
Yes, LLM-era scrapers are frequently making use of large numbers of IP addresses from all over the place. Some of them seem to be bot nets, but based on IP subnet ownership it seems also pretty frequently to be cloud companies, many of them outside the US. In addition to fanning out to different IPs, many of the scrapers appear to use User Agent strings that are randomised, or perhaps in some cases themselves generated by the slop factory. It's pretty fucking bleak out there, to be honest.
Sounds like a violation of the Computer Fraud and Abuse Act. If a big company training an LLM is doing that, it should be possible to find them and have them prosecuted.