Comment by paulgb

5 months ago

I haven’t tried it but I’ve heard that at least some unicode normalizers do not strip sequences of variation selectors.

1 comment

paulgb

Normalization implementations must not strip variation selectors by definition. The "normal" part of normalization means to convert a string into either consistently decomposed unicode, or composed unicode. ie U+00DC vs U+0055 + U+0308. However this decomposition mapping is also used (maybe more like abused) for converting certain "legacy" code points to non-legacy code points. There does not exist a rune which decomposes to variant selectors (and thus these variant selectors do not compose into anything) so normalization must not alter or strip them.

source: I've implemented Unicode normalization from scratch