Comment by necovek
15 hours ago
Unicode is an attempt to encode the world's languages: there is not much to like or dislike about it, it only represents the reality. Sure, it has a number of weird details, butnif anything, it's due to the desire to simplify it (like Han unification or normal forms).
Any language runtime wanting to provide date/time and string parsing functions needs access to the Unicode database (or something of comparable complexity and size).
Saying "I don't like Unicode" is like saying "I don't like the linguistic diversity in the world": I mean sure, OK, but it's still there and it exists.
Though note that date-time, currency, number, street etc. formatting is not "Unicode" even if provided by ICU: this is similarly defined by POSIX as "locales", anf GNU libc probably has the richest collection of locales outside of ICU.
There are also many non-Unicode collation tables (think phonebook ordering that's different for each country and language): so no good sort() without those either.
Does that include emojis?
Emojis are complicated from a font rendering perspective. But from a string processing perspective, they're generally going to be among the simplest characters: they don't have a lot of complex properties with a lot of variation between individual characters. Compare something like the basic Latin characters, where the mappings for precomposed characters are going to vary wildly from 'a' to 'b' to 'c', etc., whereas the list of precomposed characters for the emoji blocks amounts to "none."
Agreed!
FWIW, they are not even "complicated" from a font rendering perspective: they're simple non-combining characters and they are probably never used in ligatures either (though nothing really stops you; just like you can have locale-specific variants with locl tables). It's basically "draw whatever is in a font at this codepoint".
Yes, if you want to call them out based on Unicode names, you need to have them in the database, and there are many of them, so a font needs to have them all, but really, they are the simplest of characters Unicode could have.
5 replies →
Sorry, what? You mean, emoji composition rules are simpler than combining diacritics? https://blog.codepoints.net/emojis-under-the-hood.html
1 reply →
[flagged]