Comment by jkimmel
10 years ago
I really like this idea from a historical preservation perspective.
HN comments contain a lot of a value in aggregate, and it would be a shame to lose necessary context to exploit this value due to simple link-rot.
However, I can see issues arising with paywalled links. The HN cache would likely display a rather useless paywall for many of the most popular stories. Navigating around the paywall by technical means may present an IP issue.
What do you think about HN auto-submitting to archive.org?
Archive.org does follow some rules (e.g. robots.txt and passwords as you mentioned), so not everything would be guaranteed.
It could be a good start for a lot of the content here, though.
EDIT: Forgot to add that I'm not sure how much it'd help with sites going down from the HN attention. I could easily see archive.org moving a bit slower than HN's users.
> HN auto-submitting to archive.org
Can you describe this in more detail?
Not the same person, but I assume they're referring to the "Save Page Now" function of the Wayback Machine (https://archive.org/web/).
This tells Archive.org's crawler to immediately process a page and add it to the Wayback Machine's cache. Unfortunately there's no public API for this, but it is possible to programmatically submit a request to their endpoint and scrape out the resulting archive link (and I have code that does this, if that would help).
This would be an excellent start IMO.