Comment by moefh
2 days ago
It's not 2 million, it's a little over 1 million.
The exact number is 1112064 = 2^16 - 2048 + 16*2^16: in UTF-16, 2 bytes can encode 2^16 - 2048 code points, and 4 bytes can encode 16*2^16 (the 2048 surrogates are not counted because they can never appear by themselves, they're used purely for UTF-16 encoding).
Even with just 1 million codepoints, why did they feel the need for CJK unification? Was it so it would all fit in UCS-2 or something?
Yes, that was exactly the reason. CJK unification happened during the few years when we were all trying to convince ourselves that 16 bits would be enough. By the time we acknowledged otherwise, it was too late.