Comment by kbolino
20 hours ago
Yep, you're right. Those two bytes are forbidden to prevent overlong encodings. A number of multibyte sequences are forbidden for the same reason too.
A true flaw of UTF-8 in the long run. They should have biased the values of multibyte sequences to remove redundant encodings.
No comments yet
Contribute on Hacker News ↗