Comment by rmunn

5 months ago

Probably a good idea, but when UTF-8 was designed the Unicode committee had not yet made the mistake of limiting the character range to 21 bits. (Going into why it's a mistake would make this comment longer than it's worth, so I'll only expound on it if anyone asks me to). And at this point it would be a bad idea to switch away from the format that is now, finally, used in over 99% of all documents online. The gain would be small (not zero, but small) and the cost would be immense.

Didn't they limit the range to 21 bits because UTF-16 has that limitation?

  • That is indeed why they limited it, but that was a mistake. I want to call UTF-16 a mistake all on its own, but since it predated UTF-8, I can't entirely do so. But limiting the Unicode range to only what's allowed in UTF-16 was shortsighted. They should, instead, have allowed UTF-8 to continue to address 31 bits, and if the standard grew past 21 bits, then UTF-16 would be deprecated. (Going into depth would take an essay, and at this point nobody cares about hearing it, so I'll refrain).

    • I suppose it's still possible to extend to 31 bits in the future, once UTF-16 has become obsolete enough. How big is the need for it right now?

      4 replies →