← Back to context

Comment by silon42

11 hours ago

Do all the letters have separate unicode codepoints? (no reuse Latin ones?)

There are the following codepoints:

    U+0049 I LATIN CAPITAL LETTER I
    U+0069 i LATIN SMALL LETTER I
    U+0130 İ LATIN CAPITAL LETTER I WITH DOT ABOVE
    U+0131 ı LATIN SMALL LETTER DOTLESS I

While the names of the first two don't explicitly state that they should be dotless and dotted, respectively, the Unicode standard section on the block containing those two [0] does contrast them with the dotted and dotless versions, at least implying that they should be rendered dotless and dotted, respectively.

Unicode has historically been against adding a separate codepoint for every single language's orthography when the glyphs are (almost) identical to an existing one ("allographs"). Controversy arose when the consortium proposed considering Han characters, which do have language variants, to be allographs, which led to what is known as "Han unification".

[0]: https://www.unicode.org/charts/PDF/U0000.pdf