Comment by mjevans
14 hours ago
They can happen to _be_ Unicode. Composition operations (for fully terminated Unicode strings) should work, but require eventual normalization.
That's the other part of the resume UTF8 strings mid way, even combining broken strings still results in all the good characters present.
Substring operations are more dicey; those should be operating with known strings. In pathological cases they might operate against portions of Unicode bits... but that's as silly as using raw pointers and directly mangling the bytes without any protection or design plans.
No comments yet
Contribute on Hacker News ↗