← Back to context

Comment by moody__

2 months ago

Normalization implementations must not strip variation selectors by definition. The "normal" part of normalization means to convert a string into either consistently decomposed unicode, or composed unicode. ie U+00DC vs U+0055 + U+0308. However this decomposition mapping is also used (maybe more like abused) for converting certain "legacy" code points to non-legacy code points. There does not exist a rune which decomposes to variant selectors (and thus these variant selectors do not compose into anything) so normalization must not alter or strip them.

source: I've implemented Unicode normalization from scratch