Comment by boomlinde

11 years ago

"Universal" is a strong word for data adhering to an arbitary standard that has only existed for a tiny fraction of human history.

As for not being able to mount physical media, how is that problem not exactly the same regardless of the text encoding?

Indeed - ASCII is not "universal". It's just common.

Hand someone who's no knowledge of ASCII some ASCII bytes, and no compliant editors, and see how hard it is to figure out they're dealing with the English alphabet.

ASCII mostly benefits from being a single-byte format - if a tool parses ASCII, then it's very easy for humans to go "right, that's English" looking at the output.

But this would be just as true if we had tools which handled Unicode (and most editors do exactly that now). Or it we had some other type of common binary standard which also encoded units.

It's all about the metadata - which isn't necessarily always in-band.

  • It wouldn't be that hard. You can think of ASCII as a simple substitution cipher - practically a Caesar cipher, really - and simple frequency analysis makes cracking such a thing literally child's play. It might take a bit longer if the data were presented as a bitstream, and you had to work out what the char length was, but even that wouldn't be too hard as there's a lot of repetitive structure in ASCII bits.

  • ”Hand someone who’s no knowledge of ASCII some ASCII bytes, and no compliant editors, and see how hard it is to figure out they're dealing with the English alphabet.”

    It would not be impossible for them to determine that the bit patterns 01100101 and 01110100 are a lot more common than, say, 01111010. But probably harder if they didn’t know that the bits are supposed to be divided into groups of 8.