Comment by toast0

1 year ago

I don't think you can mandate that in this kind of encoding. This just encodes code points, with some choices so certain invalid code points are unable to be encoded.

But normalized forms are about sequences of code points that are semantically equivalent. You can't make the non-normalized code point sequences unencodable in an encoding that only looks at one code point at a time. You wouldn't want to anchor the encoding to any particular version of Unicode either.

Normalized forms have to happen at another layer. That layer is often omitted for efficiency or because nobody stopped to consider it, but the code point encoding layer isn't the right place.

0 comments

toast0

No comments yet

Contribute on Hacker News ↗