Comment by hungryhobbit
3 days ago
There's an incredibly simple fix: block the archive for a week. No one is paying after a week, so you let the Archive access after that.
I don't see why every news outlet doesn't just do this.
3 days ago
There's an incredibly simple fix: block the archive for a week. No one is paying after a week, so you let the Archive access after that.
I don't see why every news outlet doesn't just do this.
Good idea, but only if the article can't be edited during that week. What's worth preserving is the version the audience actually read. Articles routinely get ninja-edited after publication, sometimes repeatedly. Changelogs should be mandatory but they're useless if we can't keep them honest.
Block public access not archival
The reason they're blocking archives is people can go to the archive, to bypass paywalls and avoid targeted adverts, instead of the news site. It's also to prevent AI scrapers harvesting articles.
2 replies →
I'd rather let Archive block access to that specific article for a while, but still archiving from the start.
In effect, robots.txt should have an "embargo" directive?
I dont know if this is still the case but if I told IA via robots.txt not to archive my site, it would still crawl it, archive it but not display it until I shut the site down. Once robots.txt was no longer reachable they would display the archived content. The only way to stop that was to start the site back up making robots.txt reachable and wait for them to crawl it again.
I like that idea.
Do these major publications charge per article? They should, but they don't. So their whole sell is that in aggregate (so access to all, including old articles) they are worth paying monthly for.
In which case archive is a major revenue slumper
How would archive not be a revenue drain if there was pay per read articles? I would think the incentive to try to find a free version would increase not decrease, especially for a wide class of articles that are basically, “I’m curious but not that curious” which in aggregate I might pay money for (they add value to my subscription) but individually feel wasteful (do I really want to pay to satisfy this curiosity?)
It's not about the paywall in this case. It's to prevent AI companies from scraping a publication's archives for training data. If AI companies want that data, they can compensate publishers, not extract it for free from the Internet Archive.
Yes, it's probably cheaper to just download the newspaper articles from Internet Archive than to buy them directly from newspapers. Training costs minimization, or should we call it stealing?
The article is about AI companies using the Internet Archive to source training data, not about people using it to avoid paywalls. AI companies don't care that the data is one week old.
Internet Archive can keep it escrowed until AI training kerfuffle blows over
[dead]
Greed and spite.
You people need to stop saying this. You're being greedy when you buy groceries from a cheaper supermarket. You're being greedy when you negotiate your salary or choose a job based on pay, or anything where you're trying to get more stuff for yourself. Those things are all perfectly good behaviors, they make the world more productive, so everyone wins overall. Greed isn't a problem.
Spite? No evidence of that. They probably just don't want to lose the money from paying customers and ads. You're just making up fantasy. Perhaps projecting your own spite.
> Greed isn't a problem.
you listed
1. buying the cheapest groceries you can reasonably find 2. trying to get the highest salary you can 3. literally any time you try to get more for yourself
that's a weak list from which to conclude that greed isn't a problem, especially since in the case of 1. and 2. someone's making money off you, the person who's supposedly greedy in these scenarios.
6 replies →
It's clear that people place some non-zero value on archival content. It should be unsurprising that news outlets also place some non-zero value on it. Given that they place some non-zero value on it, it is unsurprising that they do not give it away for zero. Disagreeing with their estimation of the value is understandable, but surely it's easy to see why most news outlets do what they do.