← Back to context

Comment by zahlman

10 hours ago

This is not quite right, at least for Python. .upper() and .lower() (and .casefold() as well) implement the default casing algorithms from the Unicode specification, which are one-to-many (but still locale-naive). Other languages, meanwhile, might well implement locale-aware mapping that defaults to the system locale rather than requiring a locale to be passed.

For one-to-one I was thinking of the Unicode Character Database single-character upper, lower, and title case entries in the UnicodeData.txt file [1] -- i.e. the simple case mappings [2].

The one-to-many mappings are specified in SpecialCasing.txt. Some are locale independent (like my German sharp/double S example) and others are locale aware (like the Turkish example). For the locale aware mappings you need to ensure you are setting the language/locale correctly in your language of choice, or that the language is doing the right thing.

[1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.tx...

[2] https://www.unicode.org/reports/tr44/#Casemapping

[3] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing....