← Back to context

Comment by nswango

5 hours ago

So you think that the letters in the Greek and Cyrillic alphabets which are printed identically to the Latin A should not exist?

And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?

> So you think that the letters in the Greek and Cyrillic alphabets which are printed identically to the Latin A should not exist?

Yes. Unicode should not be about semantic meaning, it should be about the visual. Like text in a book.

> And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?

Yup. Consider a printed book. How can you tell if a letter is a Greek letter or a Latin letter?

Those Unicode homonyms are a solution looking for a problem.

  • > Yes. Unicode should not be about semantic meaning, it should be about the visual. Like text in a book.

    Do you think 1, l and I should be encoded as the same character, or does this logic only extend to characters pesky foreigners use.

  • Unicode is about semantics not appearance. If you don't need semantics then use something different.

    • > Unicode is about semantics not appearance.

      And that's where it went off the rails into lala land. 'a' can have all kinds of distinct meanings. How are you going to make that work? It's hopeless.

  • >Yup. Consider a printed book. How can you tell if a letter is a Greek letter or a Latin letter?

    I can absolutely tell Cyrillic k from the lating к and latin u from the Cyrillic и.

    >should not be about semantic meaning,

    It's always better to be able to preserve more information in a text and not less.

    • > I can absolutely tell Cyrillic k from the lating к and latin u from the Cyrillic и.

      They look visually distinct to me. I don't get your point.

      > It's always better to be able to preserve more information in a text and not less.

      Text should not lose information by printing it and then OCR'ing it.

What about numbers? Would they be assigned to arabic only? I guess someone will be offended by that.

While at it we could also unify I, | and l. It's too confusing sometimes.

  • > While at it we could also unify I, | and l. It's too confusing sometimes.

    They render differently, so it's not a problem.