Comment by xigoi
4 hours ago
Pretty sure 0b11000000 and 0b11000001 are also UTF-8’s fault. Good point with the others, I guess. And I agree about UTF-8 being the best, just found it funny.
4 hours ago
Pretty sure 0b11000000 and 0b11000001 are also UTF-8’s fault. Good point with the others, I guess. And I agree about UTF-8 being the best, just found it funny.
Yep, you're right. Those two bytes are forbidden to prevent overlong encodings. A number of multibyte sequences are forbidden for the same reason too.
A true flaw of UTF-8 in the long run. They should have biased the values of multibyte sequences to remove redundant encodings.