← Back to context

Comment by nkurz

10 years ago

Maybe add a link to 'cached' in addition to 'web'? You could either do your own caching when the link is submitted, or use a service. The link will be useful immediately if the site goes down over load, and keeping an archive will keep the comments comprehensible for future readers if/when the link erodes.

I really like this idea from a historical preservation perspective.

HN comments contain a lot of a value in aggregate, and it would be a shame to lose necessary context to exploit this value due to simple link-rot.

However, I can see issues arising with paywalled links. The HN cache would likely display a rather useless paywall for many of the most popular stories. Navigating around the paywall by technical means may present an IP issue.

  • What do you think about HN auto-submitting to archive.org?

    Archive.org does follow some rules (e.g. robots.txt and passwords as you mentioned), so not everything would be guaranteed.

    It could be a good start for a lot of the content here, though.

    EDIT: Forgot to add that I'm not sure how much it'd help with sites going down from the HN attention. I could easily see archive.org moving a bit slower than HN's users.

We worked on something like this for a while last year (code name "the archivist") with the intention of making Readability-style versions of stories with plain text, major images and no cruft. The purpose of the experiment was to see if it would speed up moderation. If we kept it, we hoped to share it with everybody (where by "hoped" I mean "would have unless we couldn't"). In the end, we didn't keep it because it didn't speed up moderation and it is one of those problems that turns out to be increasingly nontrivial the closer you get.

If we did anything like it again, I'd still hope to share it with everybody, but perhaps not by adding a third link. I already feel bad for adding two.

  • Sorry to see you've not kept up with this.

    I access HN on a few devices, including some whose rendering of "modern" (that is: broken) site designs is at best poor. Frequently no content is visible, either due to text not appearing at all, or being completely obscured by other elements. A Readability view, stripped of cruft, would be excellent for this. I'm aware of issues with site referrals, copyright, etc., but really, it would be helpful.

    Otherwise: Internet Archive and Coral Cache are both existing systems which can and do cache some content, on request. IA seems to like having hot stuff fed them, CC have been quite spotty in reliability over the past year or two (both not properly caching content, and simply not responding).

    • I'm open to working on it again. Qua user, I would love to be able to view Readability-style versions of stories quickly. And think of all the analytics people could do on a near-complete archive of all HN stories.

      But it's a matter of priorities. Had it sped up moderation it would have both paid for itself and made certain campers happier. But it didn't turn out that way. Beyond that, technically it's a nontrivial problem to get working on the full range of content, and then there are the nontechnical obstacles. We wouldn't do it without being sure we could release it.

      Sending requests to Internet Archive might be an option if they'd be ok with it, but that of course would only help with caching, not decrufting.

      1 reply →