Comment by Analemma_

5 months ago

It's always dangerous to stick one's neck out and say "[this many bits] ought to be enough for anybody", but I think it's very unlikely we'll ever run out of UTF-8 sequences. UTF-8 can represent about 1.1 million code points, of which we've assigned about 160,000 actual characters, plus another ~140,000 in the Private Use Area, which won't expand. And that's after getting nearly all of the world's known writing systems: the last several Unicode updates have added a few thousand characters here and there for very obscure and/or ancient writing systems, but those won't go on forever (and things like emojis rarely only get a handful of new code points per update, because most new emojis are existing code points with combining characters).

If I had to guess, I'd say we'll run out of IPv6 addresses before we run out of unassigned UTF-8 sequences.

1 comment

Analemma_

lyu07282 5 months ago

The oldest script in unicode, sumerian cuneiform, is ~5,200 years old, if we were to invent new scripts at the same rate we would hit 1.1 million code points in around 31,000 years. So yeah nothing to worry about, you are absolutely right. Unless we join some intergalactic federation of planets, although they probably already have their own encoding standards we could just adopt.