← Back to context

Comment by Arnt

6 months ago

When do you think that first mistake happened?

(Pick a year, then think about why it didn't happen in that year.)

When Unicode was being specced out originally I guess. There was more interest in unifying characters at that stage (see also the far more controversial Han unification)

  • Uh-huh. At that time roundtrip compatiblity with all widely used 8-bit encodings was a major design criterion. Roundtrip meaning that you could take an input string in e.g. iso 8859-9, convert it to unicode, convert it back, and get the same string, still usable for purposes like database lookups. Would you have argued to break database lookups at the time?

    • ISO-8859-9 actually does have what I suggest:

      FD/49 are lower/upper dotless ı/I

      DD/69 are upper/lower dotted İ/i.

      There's nothing around the capability to round trip that through unicode that required 49 in ISO-8859-9 to be assigned the same unicode codepoint as 49 in ISO-8859-1 because they happen to be visually identical

      2 replies →