Comment by wvenable

5 hours ago

Blame the Unicode consortium for not coming up UTF-8 first (or, really, at all). And for assuming that 65526 code points would be enough for everyone.

So many problems could be solved with a time machine.

4 comments

wvenable

kstrauser 4 hours ago

The first draft of Unicode was in 1988. Thompson and Pike came up with UTF-8 in 1992, made an RFC in 1998. UTF-16 came along in 1996, made an RFC in 2000.

The time machine would've involved Microsoft saying "it's clear now that USC-2 was a bad idea, so let's start migrating to something genuinely better".

wvenable 22 minutes ago

I don't think it was clear at the time that UTF-8 would take off. UCS-2 and then UTF-16 was well established by 2000 in both Microsoft technologies and elsewhere (like Java). Linux, despite the existence of UTF-8, would still take years to get acceptable internationalization support. Developing good and secure internationalization is a hard problem -- it took a long time for everyone.
It's now 2026, everything always looks different in hindsight.
gpvos 2 hours ago
MS could easily have added proper UTF-8 support in the early 2000s instead of the late 2010s.
- kstrauser 1 hour ago
  
  Yep. It would've been a better landing pad than UTF-16 since they had to migrate off UCS-2 anyway.