← Back to context

Comment by mrighele

19 hours ago

The issue is not the invention of the dotless I, it already exists, the issue is that the took a vowerl , i/I, and the assigned the lower case to one vowel, and the upper case to a different one, and invented what left missing.

It's like they decided that the uppercase of "a" is "E" and the uppercase of "e" is "A".

This is misleading, because it assumes that i/I naturally represent one vowel, which is just not the case. i/I represents one vowel in _English_, when written with a latin script. ̶I̶n̶ ̶f̶a̶c̶t̶ ̶e̶v̶e̶n̶ ̶t̶h̶i̶s̶ ̶i̶s̶n̶'̶t̶ ̶c̶o̶r̶r̶e̶c̶t̶,̶ ̶i̶/̶I̶ ̶r̶e̶p̶r̶e̶s̶e̶n̶t̶s̶ ̶o̶n̶e̶ ̶p̶h̶o̶n̶e̶m̶e̶,̶ ̶n̶o̶t̶ ̶o̶n̶e̶ ̶v̶o̶w̶e̶l̶.̶ <see troad's comment for correction>

There is no reason to assume that the English representation is in general "correct", "standard", or even "first". The modern script for Turkish was adopted around the 1920's, so you could argue perhaps that most typewriters presented a standard that should have been followed. However, there was variation even between different typewriters, and I strongly suspect that typewriters weren't common in Turkey when the change was made.

  • > In fact even this isn't correct, i/I represents one phoneme, not one vowel.

    Not quite. In English, 'i' and 'I' are two allographs of one grapheme, corresponding to many phonemes, based on context. (Using linguistic definitions here, not compsci ones.) The 'i's in 'kit' and 'kite' stand for different phonemes, for example.

    > There is no reason to assume that the English representation is in general "correct", "standard", or even "first".

    Correct, but the I/i allography is not exclusive to English. Every Latin script functions that way, other than Turkish and Turkish-derived scripts.

    No one is saying Turkish cannot break from that convention - they can feel free to do anything they like - but the resulting issues are fairly predictable, and their adverse effects fall mainly on Turkish speakers in practice, not on the rest of us.

    • > but the resulting issues are fairly predictable, and their adverse effects fall mainly on Turkish speakers in practice, not on the rest of us.

      I don't think it's fair to call it predictable. When this convention was chosen, the problem of "what is the uppercase letter to I" was always bound to the context of language. Now it suddenly isn't. Shikata ga nai. It wasn't even an explicit assumption that can be reflected upon, it was an implicit one, that just happened.

    • > Not quite. In English, 'i' and 'I' are two allographs of one grapheme, corresponding to many phonemes, based on context. (Using linguistic definitions here, not compsci ones.) The 'i's in 'kit' and 'kite' stand for different phonemes, for example.

      You're right, apologies my linguistics is rusty and I was overconfident.

      > Correct, but the I/i allography is not exclusive to English. Every Latin script functions that way, other than Turkish and Turkish-derived scripts.

      I think my main argument is that the importance of standardizing to i/I was much less obvious in the 1920's. The benefits are obvious to us now, but I think we would be hard pressed to predict this outcome a-priori.

      1 reply →

  • >This is misleading, because it assumes that i/I naturally represent one vowel, which is just not the case.

    It does in literally any language using a latin alphabet other than Turkish.

Nope, we decided to do it the correct and logical way for our alphabet. Some glyphs are either dotted or dotless. So, we have Iı, İi, Oo, Öö, Uu, Üü, Cc, Çç, Ss and Şş. You see the Ii pair is actually the odd one in the series.

Also, we don't have serifs in our I. It's just a straight line. So, it's not even related to your Ii pair in English. You can't dictate how we write our straight lines, can you.

The root cause of the problem is in the implementation and standardization of the computer systems. Computers are originally designed only for English alphabet in mind. And patched to support other languages over time, poorly. Computers should obey the language rules, not the other way around.

  • Yep, but you decided to abuse Latin alphabet instead of creating your own code page with your own letters and with your own rules.

    • We created our own letters and our own rules. In 1928, long before code pages and computers.

      The assumption that letters come in universal pairs is wrong. That assumption is the bug. You can’t assume that capitalization rules must be the same for every language implementing a specific alphabet. Those rules may change for every language. They do.

      And not just capitalization rules. Auto complete, for instance, should respect the language as well. You can’t “correct” a French word to an English word. Localization is not optional when dealing with text.

      2 replies →

  • >Also, we don't have serifs in our I.

    That depends on font.

    >So, it's not even related to your Ii pair in English.

    Modern Turkish uses the Latin script, of course it's related.

    >You can't dictate how we write our straight lines, can you.

    No, I can't, I just want to understand why the Turks decided to change this letter, and this letter only, from the rest of the standard Latin script/diacritics.

    • > I just want to understand why the Turks decided to change this letter, and this letter only

      Because Turkish uses a phonetic alphabet suited for Turkish sounds, based on latin letters. There are 8 vovels come in two subsets:

      AIOU and EİÖÜ.

      When you pair them with zip(), pairs are phonetically related sounds but totally different letters at the same time. Turkish also uses suffixes for everything, and vowels in these suffixes sometimes change between these two subgroups.

      This design lets me write any word uniquely and almost correctly using the Turkish alphabet.

      Dis dizayn lets mi rayt ani vörd yüniğkli end olmost koreğtkli yuzing dı törkiş alfabet.

      Ö is the dotted version of O. İ is the dotted version of I. Related but different. Their lower case versions are logically (not by historical convention): öoiı. So we didn’t just wanted to change I, and only I. We just added dots. Since there are no Oö pair in any language our OoÖö vovels didn’t get the same attention. Same for our Ğğ and Şş.

      I hope this answers the question.

  • > Computers are originally designed only for English alphabet in mind.

    Computers are originally designed for no alphabet at all. They only have two symbols.

    ASCII is a set of operating codes that includes instructions to physically move different parts of a mechanical typewriter. It was already a mistake when it was used for computer displays.

    • Note that ASCII stands for "American Standard Code for Information Interchange". There's no expectation here that this is a suitable code for any language other than English, the de-facto language of the United States of America.

      1 reply →

I don’t think that’s the right way to think about it. It’s not like they were Latinizing Turkish with ASCII in mind. They wanted a one-to-one mapping between letters and sounds. The dot versus no dot marks where in your mouth or throat the vowel is formed. They didn’t have this concept that capital I automatically pairs with lowercase i. The dot was always part of the letter itself. The reform wasn’t trying to fit existing Western conventions, it was trying to map the Turkish sounds to symbols.

  • They switched from Arabic script to Latin script. They literally did latinize Turkish, but they ditched the convention of 1 to 1 correspondence between lowercase and uppercase letters that is invariant across all languages that use Latin script except for German script, Turkish script and its offspring Azerbaijani script.

    • > correspondence between lowercase and uppercase [not in] German script

      Where is it broken in German script? Do you mean small ß and capital ẞ?

      1 reply →

Not really. Turkish has a feature that is called "vowel harmony". You match suffixes you add to a word based on a category system: low pitch vs high pitch vowels where a,ı,o,u are low pitch and e,i,ö,ü are high pitch.

Ö and ü were already borrowed from German alphabet. Umlaut-added variants of 'ö' and 'ü' have a similar effect on 'o' and 'u' respectively: they bring a back vowel to front. See: https://en.wikipedia.org/wiki/Vowel . Similarly removing the dots bring them back.

Turkish already had i sound and its back variant which is a schwa-like sound: https://en.wikipedia.org/wiki/Close_back_unrounded_vowel . It has the same relation in IPA as 'ö' has to 'o' and 'ü' has to 'u'. Since the makers of the Turkish variant of Latin Alphabet had the rare chance of making a regular pronunciation system with the state of the language and since removing the dots had the effect of making a front vowel a back vowel, they simply copied this feature from ö and ü to i:

Just remove the dots to make it a back vowel! Now we have ı.

When comes to capitalization, ö becomes Ö, ü becomes Ü. So it is just logical to make the capital of i İ and the lowercase of I ı.

  • Yes it's hard to come up with a different capital than I unless you somehow can see into the future and foresee the advent of computers, which the Turkish alphabet reform predates.

    Of course the latin capital I is dotless because originally the lowercase latin "i" was also dotless. The dot has been added later to make text more legible.

  • > low pitch vs high pitch vowels where a,ı,o,u are low pitch and e,i,ö,ü are high pitch.

    Does that reflect the Turkish terminology? Ordinarily you would call o and u "high" while a and e are "low". The distinction between o/u and ö/ü is the other dimension: o/u are "back" while ö/ü are "front".

    • > Does that reflect the Turkish terminology?

      Yes. The Turkish terms are "kalın ünlü" and "ince ünlü". They literally translate to "low pitch wovel"/"high pitch wovel" )(or "thick wovel"/"thin wovel") in this context.

      There is a second wovel harmony rule [1] (called lesser wovel harmony) that makes the distinction you pointed out. Letters a/e/ı/i are called flat wovels, and o/ö/u/ü are called round wovels.

      [1] https://georgiasomethingyouknowwhatever.wordpress.com/2015/0...

  • So, instead of adding two full letters, with proper upper case and lower case, you added two halves to hack Latin alphabet. This is the bug.