← Back to context

Comment by remus

3 days ago

That's a real shame. I am involved with some history-related projects and the number of websites which go offline is huge, and the wayback machine is incredibly helpful for unearthing these dead sites.

It is not hard to imagine a future in 50 years time where a huge percentage of this content is lost forever, or at best incredibly hard to find.

This future is here already, policy makers have it locked up. Any person who remembers what microfiche is understands the magnitude of this problem of not having a trustworthy public record. If we extended public policy from the library era, the library of congress itself would be the Internet Archive.

  • > If we extended public policy from

    Similarly and tangentially, when the US Constitution was made in an era of horseback/carriages, it explicitly authorized the creation of a public national postal service (USPS).

    If we extended that older public policy with today's technological context, they would have authorized a national Internet Service Provider. (And, like with USPS, specialized private competitors would exist.)

    • That's at least better than other countries who have essentially privatized most of their existing public infrastructure.

Some countries have national archives that all published material must by law be submitted to, including material published online. I know at least Sweden and the UK has that. This will be available for researchers, though usually you have to physically travel to the archive to access the data, so not as convenient as IA.

(It is worth noting that at least in Sweden "published" here has a very specific meaning, that doesn't include personal websites etc, but it does include news outlets.)

In the walls of the cubicle there were three orifices. To the right of the speakwrite, a small pneumatic tube for written messages, to the left, a larger one for newspapers; and in the side wall, within easy reach of Winston's arm, a large oblong slit protected by a wire grating. This last was for the disposal of waste paper. Similar slits existed in thousands or tens of thousands throughout the building, not only in every room but at short intervals in every corridor. For some reason they were nicknamed memory holes. When one knew that any document was due for destruction, or even when one saw a scrap of waste paper lying about, it was an automatic action to lift the flap of the nearest memory hole and drop it in, whereupon it would be whirled away on a current of warm air to the enormous furnaces which were hidden somewhere in the recesses of the building.

I gave a talk about this when I worked for The Archive. There was an article in Scientific American about how the average lifetime of a page on the net before it 404s is about 100 days. That article is offline now and we accessed it through the wayback machine.

My own last project before I left was to ingest records from crawl dumps from the defunct cuil.com website. About 200 TB of stuff that brought back 60 billion URLs.

The nature of the internet has changed and it's become an ephemeral place for many people where you just through things in and others mine it as "data".

I wonder what’s the motivation though. Subscriptions? At a minimum they could limit based on recency. It’s “news” after all so having a number of days of delay for archival would solve most issues, I assume.

Unfortunately the IA itself has become way less usable as their aggressive anti-bot protection means that actually doing any kind of (manual) explorative research, as opposed to pulling an individual website that you already know the exact URL of, is more than likely to get you temp banned.