← Back to context

Comment by mmastrac

1 year ago

Normalization is annoying but understandable - you have common characters that are clearly SOMETHING + MODIFIER, and they are common enough that you want to represent them as a single character to avoid byte explosion. SOMETHING and MODIFIER are also both useful on their own, potentially combining with other less common characters that are less valuable to encode (unfrequent, but valuable).

If you skip all the modifiers, you end up with an explosion in code space. If you skip all the precomposed characters, you end up with an explosion in bytes.

There's no good solution here, so normalization makes sense. But then the committee says ".. and what about this kind of normalization" and then you end up.. here.

Right. But if we had a chance for a do-over, it'd be really nice if we all just agreed on a normalization form and used it from the start in all our software. Seems like a missed opportunity not to.

  • I think NFC is the agreed-upon normalization form, is it not? The only real exception I can think of is HFS+ but that was corrected in APFS (which uses NFC now like the rest of the world).