← Back to context

Comment by pvillano

5 hours ago

Unicode is "designed to support the use of text in all of the world's writing systems that can be digitized"

Unicode needs tab, space, form feed, and carriage return.

Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.

Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.

Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.

Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.

Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.

[flagged]

  • That's a very narrow view of the world. One example: In the past I have handled bilingual english-arabic files with switches within the same line and Arabic is written from left to right.

    There are also languages that are written from to to bottom.

    Unicode is not exclusively for coding, to the contrary, pretty sure it's only a small fraction of how Unicode is used.

    > Somehow people didn't need invisible characters when printing books.

    They didn't need computers either so "was seemingly not needed in the past" is not a good argument.

    • > That's a very narrow view of the world.

      Yes, it is. Unicode has undergone major mission creep, thinking it is now a font language and a formatting language. Naturally, this has lead to making it a vector for malicious actors. (The direction reversing thing has been used to insert malicious text that isn't visible to the reader.)

      > Unicode is not exclusively for coding

      I never mentioned coding.

      > They didn't need computers

      Unicode is for characters, not formatting. Formatting is what HTML is for, and many other formatting standards. Neither is it for meaning.

    • > That's a very narrow view of the world.

      But not one that would surprise anyone familiar with WalterBright's antics on this website…

      1 reply →

  • The fact is that there were so many character sets in use before Unicode because all these things were needed or at least wanted by a lot of people. Here's a great blog post by Nikita Prokopov about it: https://tonsky.me/blog/unicode/

    • Sometimes you gotta say no. Trying to please every hare brained idea leads to madness.

      Normalized code point sequences are another WTF feature.