← Back to context

Comment by Mikhail_Edoshin

10 hours ago

No, UTF-16 is much simpler in that aspect. And its design is no less brilliant. (I've written an state machine encoder and decoder for both these encodings.) If an application works a lot with text I'd say UTF-16 looks more attractive for the main internal representation.

UTF-16 is simpler most of the time, and that's precisely the problem. Anyone working with UTF-8 knows they will have to deal with multibyte codepoints. People working with UTF-16 often forget about surrogate characters, because they're a lot rarer in most major languages, and then end up with bugs when their users put emoji into a text field.