Comment by bbminner
1 day ago
I was really confused about the case folding, this page explained the motivation well https://jean.abou-samra.fr/blog/unicode-misconceptions
""" Continuing with the previous example of “ß”, one has lowercase("ss") != lowercase("ß") but uppercase("ss") == uppercase("ß"). Conversely, for legacy reasons (compatibility with encodings predating Unicode), there exists a Kelvin sign “K”, which is distinct from the Latin uppercase letter “K”, but also lowercases to the normal Latin lowercase letter “k”, so that uppercase("K") != uppercase("K") but lowercase("K") == lowercase("K").
The correct way is to use Unicode case folding, a form of normalization designed specifically for case-insensitive comparisons. Both casefold("ß") == casefold("ss") and casefold("K") == casefold("K") are true. Case folding usually yields the same result as lowercasing, but not always (e.g., “ß” lowercases to itself but case-folds to “ss”). """
One question I have is why have Kelvin sign that is distinct from Latin K and other indistinguishable symbols? To make quantified machine readable (oh, this is not a 100K license plate or money amount, but a temperature)? Or to make it easier for specialized software to display it in correct placed/units?
They seem to have (if I understand correctly) degree-Celsius and degree-Fahrenheit symbols. So maybe Kelvin is included for consistency, and it just happens to look identical to Latin K?
IMO the confusing bit is giving it a lower case. It is a symbol that happens to look like an upper case, not an actual letter…
And why can't the symbol be a regular old uppercase "K"? Who is this helping?
Unicode wants to be able to preserve round-trip re-encoding from this other standard which has separate letter-K and degree-K characters. Making these small sacrifices for compatibility is how Unicode became the defacto world standard.
4 replies →
A symbol may look differently than original letter, for example N - №, € - E (Є), S - $, integral, с - ©, TM - ™, a - @, and so on.
However, those symbols doesn't have lower case variants. Moreover, lower case k means kilo-, not a «smaller Kelvin».
I think just using uppercase Latin K is the recommendation.
But, I dunno. Why would anybody apply upper or lower case operators to a temperature measurement? It just seems like a nonsense thing to do.
2 replies →
Probably useful in a non-latin codeset?
having a dedicated Kelvin symbol preserves the semantics.
> One question I have is why have Kelvin sign that is distinct from Latin K and other indistinguishable symbols?
To allow round-tripping.
Unicode did not win by being better than all previously existing encodings, even though it clearly was.
It won by being able to coexist with all those other encodings for years (decades) while the world gradually transitioned. That required the ability to take text in any of those older encodings and transcode it to Unicode and back again without loss (or "gain"!).
> One question I have is why have Kelvin sign that is distinct from Latin K and other indistinguishable symbols?
Unicode has the goal of being a 1:1 mapping for all other character encodings. Usually weird things like this is so there can be a 1:1 reversible mapping to some ancient character encoding.