Comment by GuB-42
7 hours ago
Turns out that I rarely need to know sizes or indices of a UTF8 string in anything other than bytes.
If I write a parser for instance, usually, what to know is "what is the sequence of byte between this sequence of bytes and that sequence of bytes". That there are flag emojis or whatever in there don't matter, and the way UTF8 works ensures that a character representation doesn't partially overlap with a another.
What the byte sequences mean only really matters if you are writing an editor, so that you know how many bytes to remove when you press backspace for instance.
Truncation as to prevent buffer overflow seems to be a case where it would matter but not really. An overflow is an error and should be treated as such. Truncation is a safety mechanism, for when having your string truncated is a lesser evil. At that point, having half a flag emoji doesn't really matter.
No comments yet
Contribute on Hacker News ↗