Comment by lysace
25 days ago
At some point they must become more cost efficient by pure market economics mechanisms. That implies less load on sites. Much of the scraping that I see is still very dumb/repetative. Like Googlebot in like 2001.
(Blocking Chinese IP ranges with the help of some geoip db helps a lot in the short term. Azure as a whole is the second largest source of pure idiocy.)
They seem to have so much bubble money at the moment that the cost of scraping is probably a rounding error in their pocket change.
So the cost of caching should be a rounding error as well. If The Internet Archive can afford to cache vast swathes of the web, then surely the big AI companies can do so.
Exactly.