Comment by jkimmel

10 years ago

I really like this idea from a historical preservation perspective.

HN comments contain a lot of a value in aggregate, and it would be a shame to lose necessary context to exploit this value due to simple link-rot.

However, I can see issues arising with paywalled links. The HN cache would likely display a rather useless paywall for many of the most popular stories. Navigating around the paywall by technical means may present an IP issue.

What do you think about HN auto-submitting to archive.org?

Archive.org does follow some rules (e.g. robots.txt and passwords as you mentioned), so not everything would be guaranteed.

It could be a good start for a lot of the content here, though.

EDIT: Forgot to add that I'm not sure how much it'd help with sites going down from the HN attention. I could easily see archive.org moving a bit slower than HN's users.

  • > HN auto-submitting to archive.org

    Can you describe this in more detail?

    • Not the same person, but I assume they're referring to the "Save Page Now" function of the Wayback Machine (https://archive.org/web/).

      This tells Archive.org's crawler to immediately process a page and add it to the Wayback Machine's cache. Unfortunately there's no public API for this, but it is possible to programmatically submit a request to their endpoint and scrape out the resulting archive link (and I have code that does this, if that would help).