Comment by tkot

6 hours ago

> it really is also a tool to best codify spoken language of the Slavs (in a sense, it is trivially provable that Cyrillic script is better adapted even to languages which do not use it today, but have to resort to digraphs or glyphs with diacritics — some are thus not using it to distance from a particular influence instead

I've heard this claim many times but never the reasoning behind it - by what metric is "ш" superior to "š" and so on?

1 comment

tkot

necovek 1 hour ago

It's less pronounced with diacritics, but enter Unicode normal forms: you can represent š either as š, or s followed by a diacritic. When you want to compare two strings, you have to normalize them to ensure you are comparing apples to apples. I can guarantee most software is broken in that regard. For Cyrillic, it just works.

With digraphs (lj, nj, dž + sometimes dj for đ too), it's even worse. Even capitalization is ambiguous: sometimes it's Lj and other times it's LJ. Then you have words like konjugacija where nj is not a digraph.

Interestingly — and not many know this — Unicode includes separate codepoints for all of the digraphs too. While well-intentioned, it only makes the problem worse.

Digraphs are especially sucky when you try sorting strings in a phonebook order as LJ comes after L, so you've got ...LI, LK..., LZ, LJA... With exceptions, it is even worse.