Comment by xigoi

5 months ago

Pretty sure 0b11000000 and 0b11000001 are also UTF-8’s fault. Good point with the others, I guess. And I agree about UTF-8 being the best, just found it funny.

1 comment

xigoi

kbolino 5 months ago

Yep, you're right. Those two bytes are forbidden to prevent overlong encodings. A number of multibyte sequences are forbidden for the same reason too.

A true flaw of UTF-8 in the long run. They should have biased the values of multibyte sequences to remove redundant encodings.