Comment by hgs3
6 months ago
This isn't "odd" behavior. It's a consequence of using a multibyte encoding scheme. Also, when dealing with case mapping, you can't assume that the character count will remain constant. This is because in Unicode full case mappings can map a character to multiple characters, meaning you might end up with more characters than you started with, regardless of the encoding used.
That's exactly right. My comment here is related:
https://news.ycombinator.com/item?id=42018937