Wikipedia tells me that Windows NT was started in 1988 and shipped in 1993, while Linux's first release was in 1991, giving them approximately the same age. Furthermore, NT shipped with UCS-2 support, not UTF-16, as UTF-16 did not exist until 1996. UTF-16 was a migration that they completed in Windows 2000. UTF-8 was first presented in 1993 and is, therefore, older than UTF-16.
With all of these facts being true (I hope), today things are such that the Windows world mostly (but not completely -- they sometimes assume two-byte characters) correctly implements UTF-16, while the Linux world correctly implements UTF-8.
Before 1996 (when 2.0 was released) Unicode was a 16bit fixed width encoding. And the first astral character allocations didn't happen until 3.1 was release in 2001.
"Unicode-enabled functions are described in Conventions for Function Prototypes. These functions use UTF-16 (wide character) encoding, which is the most common encoding of Unicode and the one used for native Unicode encoding on Windows operating systems. Each code value is 16 bits wide .. New Windows applications should use UTF-16 as their internal data representation."
Wikipedia tells me that Windows NT was started in 1988 and shipped in 1993, while Linux's first release was in 1991, giving them approximately the same age. Furthermore, NT shipped with UCS-2 support, not UTF-16, as UTF-16 did not exist until 1996. UTF-16 was a migration that they completed in Windows 2000. UTF-8 was first presented in 1993 and is, therefore, older than UTF-16.
With all of these facts being true (I hope), today things are such that the Windows world mostly (but not completely -- they sometimes assume two-byte characters) correctly implements UTF-16, while the Linux world correctly implements UTF-8.
Before 1996 (when 2.0 was released) Unicode was a 16bit fixed width encoding. And the first astral character allocations didn't happen until 3.1 was release in 2001.
Yeah. Which makes it even more irritating.. MS is still off in the double-wide stix 11 years later and still promoting UTF-16LE in userland. From http://msdn.microsoft.com/en-us/library/dd374081%28VS.85%29....
"Unicode-enabled functions are described in Conventions for Function Prototypes. These functions use UTF-16 (wide character) encoding, which is the most common encoding of Unicode and the one used for native Unicode encoding on Windows operating systems. Each code value is 16 bits wide .. New Windows applications should use UTF-16 as their internal data representation."
"Most common" encoding is best encoding!
ugh
I seriously doubt that UTF-16 still is the most common encoding. The web is mostly UTF-8 and so are most smartphones.
JavaScript strings are UCS-2 or UTF-16.