Comment by deepfriedbits

3 days ago

It's not about the paywall in this case. It's to prevent AI companies from scraping a publication's archives for training data. If AI companies want that data, they can compensate publishers, not extract it for free from the Internet Archive.

Yes, it's probably cheaper to just download the newspaper articles from Internet Archive than to buy them directly from newspapers. Training costs minimization, or should we call it stealing?