← Back to context

Comment by WalterBright

4 years ago

It's a good question. The answer is straightforward. Let's say you saw `i` in a book. How would you know if it is Latin or Cryillic?

By the context!

How would a book distinguish `a` as in `apple` from `a` as in `a+b`? (Unicode has a separate letter a from a math a.)

By the context!

This is what I meant by Unicode has no business adding semantic content. Semantics come from context, not from glyph. After all, what if I decided to write:

(a) first bullet point

(b) second bullet point

Now what? Is that letter a or math symbol a? There's no end to semantic content. It's impossible to put this into Unicode in any kind of reasonable manner. Trying to do it leads one into a swamp of hopelessness.

BTW, the attached article is precisely about deliberately misusing identical glyphs in order to confuse the reader because the C compiler treats them differently. What better case for semantic content for glyphs being a hopelessly wrongheaded idea.