Comment by zzo38computer

9 months ago

First, you should consider if you even need case folding; for many uses it will be unnecessary, anyways.

Furthermore, the proper way to do case folding will depend on such things as the character set, the language, the specific context of the text being converted (e.g. in some cases specific letters are required, such as abbreviations of the names of SI units), etc. And then, it is not necessarily only "uppercase" and "lowercase", anyways.

There might even be different ways to do by the same language, with possibly disagreements about usage (e.g. the German Eszett did not have an official capital form until 2017, although apparently some type designers did it anyways (and it was in Unicode before then, despite that)).

If the character set is Unicode, then there is not actually the correct way to do it, despite what the Unicode Conspiracy insists otherwise.

Also, for some uses the way that it will need to be done, there will be a specific way that it is required (due to the way that a file format or a protocol or whatever is working), so in such a case if the character set is something other than ASCII then you cannot just assume that it will always work in the same way.

You also cannot necessarily depend on the locale for such a thing, since it might depend on the data, as well.

These things can be as bad as they are, but Unicode just makes these things worse than that. If a program requires a specific case folding and then it will not work because it is the wrong version of Unicode and it is possible to be a security issue and/or other problems.

(Another problem, which applies even if you do not use case folding, is that some people think that all text is or should be Unicode and that one character set is suitable for everything. Actually, one character set cannot be suitable for everything, regardless of what character set it is. Even if it was (which it isn't), it wouldn't be Unicode.)

0 comments

zzo38computer

No comments yet

Contribute on Hacker News ↗