← Back to context

Comment by AnIdiotOnTheNet

4 years ago

UCS-2 seemed like a good(ish) idea at the time when Unicode's scope didn't include every possible human concept represented in icon form and UTF-8 hadn't yet been spec'd on a napkin by the first adults to bother thinking about the problem.

Even in 1989, it should have been clear that 16 bits were not enough to encode all of the Chinese characters, let alone encoding all the human scripts. Unicode today encodes 92,865 Chinese characters (https://en.wikipedia.org/wiki/CJK_Unified_Ideographs).

The only reason anybody would think of UCS-2 was a good idea was that they did not consult a single Chinese or Japanese scholar on Chinese characters.

  • Nobody in 1989 expected to encode 92k Chinese characters into Unicode because none of the existing encodings were encoding 92k characters either. The most common encoding for Chinese, GB2312, only has 7k characters.

    I recommend reading your own link, specifically the list of sources for the first CJK block to see how many characters were included and where they were sourced from.

Quite true. One of the things Windows got very wrong was UCS-2 and, later, UTF-16. So did JavaScript.

  • And macOS, and Java, and Qt, and ...

    It's almost as if it was universally seen as a good idea at the time. ~

    • Yes. I'm a bit surprised it took so long for someone to come up with something better. But if someone had tried and had come up with anything other than Rob Pike's UTF-8, we might still be sad. Sometimes you have to make mistakes before you know that's what they were.

      2 replies →