← Back to context

Comment by noufalibrahim

2 days ago

I gave a talk about this when I worked for The Archive. There was an article in Scientific American about how the average lifetime of a page on the net before it 404s is about 100 days. That article is offline now and we accessed it through the wayback machine.

My own last project before I left was to ingest records from crawl dumps from the defunct cuil.com website. About 200 TB of stuff that brought back 60 billion URLs.

The nature of the internet has changed and it's become an ephemeral place for many people where you just through things in and others mine it as "data".