Comment by Macha

1 year ago

When Unicode was being specced out originally I guess. There was more interest in unifying characters at that stage (see also the far more controversial Han unification)

4 comments

Macha

Arnt 1 year ago

Uh-huh. At that time roundtrip compatiblity with all widely used 8-bit encodings was a major design criterion. Roundtrip meaning that you could take an input string in e.g. iso 8859-9, convert it to unicode, convert it back, and get the same string, still usable for purposes like database lookups. Would you have argued to break database lookups at the time?

Macha 1 year ago
ISO-8859-9 actually does have what I suggest:
FD/49 are lower/upper dotless ı/I
DD/69 are upper/lower dotted İ/i.
There's nothing around the capability to round trip that through unicode that required 49 in ISO-8859-9 to be assigned the same unicode codepoint as 49 in ISO-8859-1 because they happen to be visually identical
- josefx 1 year ago
  
  There is a reason: ISO-8859-9 is an extended ASCII character set. The shared characters are not an accident, they are by definition the same characters. Most ISO character sets follow a specific template with fixed ranges for shared and custom characters. Interpreting that i as anything special would violate the spec.
  
  1 reply →