Comment by hlandau

3 years ago

To add, thinking a bit more about it: Designing formats to be understandable by future civilizations actually reduces to a surprising degree to the same set of problems which METI has to face. As in, sending signals designed to be intelligible to extraterrestrials - Carl Sagan's Contact, etc.

Even if you write an ASCII message directly to a tape, that data is obviously going to be encoded before being written to the tape, and you have no idea if anyone will be able to figure out that encoding in future. Trouble.

What makes this particularly pernicious is the fact that LTO nowadays is a proprietary format(!!). I believe the spec for the first generation or two of LTO might be available, but last I checked, it's been proprietary for some time. The spec is only available to the (very small) consortium of companies which make the drives and media. And the number of companies which make the drives is now... two, I think? (They're often rebadged.) Wouldn't surprise me to see it drop to one in the future.

This seems to make LTO a very untrustworthy format for archiving, which is deeply unfortunate.

The best format for archiving is many formats.

Make an LTO tape... But also make a Bluray... And also store it on some hard drives... And also upload it to a web archive...

The same for the actual file format... Upload PDF's... But also upload word documents.. And also ASCII...

And same for the location... Try to get diversity of continents... Diversity of geopolitics (ie. some in USA, some in Russia). Diversity of custodians (friends, businesses, charities).

Even ASCII itself is a strange encoding that could be lost with enough time and need to be recovered through cryptographic analysis and signals processing. That doesn't look at all likely today given UTF-8's promised and mostly accomplished ubiquity and its permanent grandfathering of ASCII. But ASCII is still only one of a number of potential encoding schemes, isn't necessarily obvious from first principles.

Past generations thought EBCDIC would last longer than it did.

Again, not that there any indications now that ASCII won't survive nearly as long as the English language does at this point, just that when we're talking about sending signals to the future, even assuming ASCII encoding is an assumption to question.

  • Baby's first cryptographic analysis, sure. Mapping letters to bits is easy, and the 8 bit repeating pattern is also easy.

    The thing that might make it hard is if people have forgotten English itself, and in that case ASCII is one of the smallest barriers.

    EBCDIC would also be fine.

These things make more sense because LTO is used for backup, not archival. Companies don't want to be able to read the tape data in 50 years, they want to be able to read it tomorrow, after the entire business campus burns down.