Comment by eviks

1 year ago

But they don't have that explosion if you only encode the combinatoric primitives those characters are made of and then use composing rules?

You still get the combinatoric explosion, but you have more bits to work with. Imagine if you could combine any 9 jamo into a single hangul syllable block. (The real combinatorics is more complicated, and I don't know if it's this bad.) Encoding just the 24 jamo and a a control character requires 25 codepoints. Giving each syllable block its own codepoint would require 24^9>2^32 codepoints.

  • > Giving each syllable block its own codepoint

    That's the thing - you wouldn't do that! Only a small subset of frequently used combos would get it's own id, the rest would only be composable