Comment by lysace

1 month ago

At some point they must become more cost efficient by pure market economics mechanisms. That implies less load on sites. Much of the scraping that I see is still very dumb/repetative. Like Googlebot in like 2001.

(Blocking Chinese IP ranges with the help of some geoip db helps a lot in the short term. Azure as a whole is the second largest source of pure idiocy.)

3 comments

lysace

incompatible 1 month ago

They seem to have so much bubble money at the moment that the cost of scraping is probably a rounding error in their pocket change.

tdeck 1 month ago

So the cost of caching should be a rounding error as well. If The Internet Archive can afford to cache vast swathes of the web, then surely the big AI companies can do so.
lysace 1 month ago

Exactly.