Comment by panpog
1 year ago
Can you fit everything into 32 bits? I have no idea, but Hangul and indict scripts seem like they might have a combinatoric explosion of infrequently used characters.
1 year ago
Can you fit everything into 32 bits? I have no idea, but Hangul and indict scripts seem like they might have a combinatoric explosion of infrequently used characters.
But they don't have that explosion if you only encode the combinatoric primitives those characters are made of and then use composing rules?
You still get the combinatoric explosion, but you have more bits to work with. Imagine if you could combine any 9 jamo into a single hangul syllable block. (The real combinatorics is more complicated, and I don't know if it's this bad.) Encoding just the 24 jamo and a a control character requires 25 codepoints. Giving each syllable block its own codepoint would require 24^9>2^32 codepoints.
> Giving each syllable block its own codepoint
That's the thing - you wouldn't do that! Only a small subset of frequently used combos would get it's own id, the rest would only be composable