Comment by basch
2 days ago
It seems a lot of people havent heard of it, but I think its worth plugging https://perma.cc/ which is really the appropriate tool for something like Wikipedia to be using to archive pages.
2 days ago
It seems a lot of people havent heard of it, but I think its worth plugging https://perma.cc/ which is really the appropriate tool for something like Wikipedia to be using to archive pages.
It costs money beyond 10 links, which means either a paid subscription or institutional affiliation. This is problematic for an encyclopedia anyone can edit, like Wikipedia.
This is assuming they can't work out something with wikipedia to offer it for free (via a wikiforge tool, or bot) in exchange for the exposure of being the most common archive provider/putting a "used by Wikimedia" logo on their website.
The major reason archive.today was being used is that it also bypassed paywalls, and I don't think perma.cc does that normally.
Wikimedia could pay, they have an endowment of ~$144M [1] (as of June 30, 2024). Perma.cc has Archive.org and Cloudflare as supporting partners, and their mission is aligned with Wikimedia [2]. It is a natural complementary fit in the preservation ecosystem. You have to pay for DOIs too, for comparison [3] (starting at $275/year and $1/identifier [4] [5]).
With all of this context shared, the Internet Archive is likely meeting this need without issue, to the best of my knowledge.
[1] https://meta.wikimedia.org/wiki/Wikimedia_Endowment
[2] https://perma.cc/about ("Perma.cc was built by Harvard’s Library Innovation Lab and is backed by the power of libraries. We’re both in the forever business: libraries already look after physical and digital materials — now we can do the same for links.")
[3] https://community.crossref.org/t/how-to-get-doi-for-our-jour...
[4] https://www.crossref.org/fees/#annual-membership-fees
[5] https://www.crossref.org/fees/#content-registration-fees
(no affiliation with any entity in scope for this thread)
> Organizations that do not qualify for free usage can contact our team to learn about creating a subscription for providing Perma.cc to their users. Pricing is based on the number of users in an organization and the expected volume of link creation.
If pricing is so much that you have to have a call with the marketing team to get a quote, i think it would be a poor use of WMF funds.
Especially because volume of links and number of users that wikimedia would entail is probably double their entire existing userbase at least.
Ultimately we are mostly talking about a largely static web host. With legal issues being perhaps the biggest concern. It would probably make more sense for WMF to create their own than to become a perma.cc subscriber.
However for the most part, partnering with archive.org seems to be going well and already has some software integration with wikipedia.
If the WMF had a dollar for every proposal to spend Endowment-derived funds, their Endowment would double and they could hire one additional grant-writer
4 replies →
Does Wikipedia really need to outsource this? They already do basically everything else in-house, even running their own CDN on bare metal, I'm sure they could spin up an archiver which could be implicitly trusted. Bypassing paywalls would be playing with fire though.
> Does Wikipedia really need to outsource this?
I hope so. Archiving is a legal landmine.
Archive.org is the archiver, rotted links are replaced by Archive.org links with a bot.
https://meta.wikimedia.org/wiki/InternetArchiveBot
https://github.com/internetarchive/internetarchivebot
Yeah for historical links it makes sense to fall back on IAs existing archives, but going forward Wikipedia could take their own snapshots of cited pages and substitute them in if/when the original rots. It would be more reliable than hoping IA grabbed it.
12 replies →
Archive.org are left wing activists that will agree to censor anything other left wing activists or large companies don't want online.
3 replies →
Of course they do. If Wikipedia did it themselves they'd immediately get DMCA'd and sued into oblivion.
> Bypassing paywalls would be playing with fire though.
That's the only reason archive.today was used. For non-paywalled stuff you can use the wayback machine.
I switched to Perma.cc earlier this week and have had a mixed experience to say the least. I think image heavy pages just error out completely, while still charging me such as:
https://www.in.gov/nircc/planning/highway/traffic-data/inter...
and reddit blocks their agent seemingly. It is open source though.
[dead]
The 3 listed alternatives there seem to have nothing to do with digital archiving. Here's a better alternative to g2 that doesn't login-wall you:
https://alternativeto.net/software/freezepage/