Comment by paulgb
2 months ago
I haven’t tried it but I’ve heard that at least some unicode normalizers do not strip sequences of variation selectors.
2 months ago
I haven’t tried it but I’ve heard that at least some unicode normalizers do not strip sequences of variation selectors.
Normalization implementations must not strip variation selectors by definition. The "normal" part of normalization means to convert a string into either consistently decomposed unicode, or composed unicode. ie U+00DC vs U+0055 + U+0308. However this decomposition mapping is also used (maybe more like abused) for converting certain "legacy" code points to non-legacy code points. There does not exist a rune which decomposes to variant selectors (and thus these variant selectors do not compose into anything) so normalization must not alter or strip them.
source: I've implemented Unicode normalization from scratch