Comment by jp1016
10 days ago
One practical thing I appreciated about MessageFormat is how it eliminates a bunch of conditional UI logic.
I used to write switch/if blocks for:
• 0 rows → “No results” • 1 row → “1 result” • n rows → “{n} results”
Which seems trivial in English, but gets messy once you support languages with multiple plural categories.
I wasn’t really aware of how nuanced plural rules are until I dug into ICU. The syntax looked intimidating at first, but it actually removes a lot of branching from application code.
I’ve been using an online ICU message editor (https://intlpull.com/tools/icu-message-editor) to experiment with plural/select cases and different locales helped me understand edge cases much faster than reading the spec alone.
This post shows a lot of the challenges with localisation, that many seemingly simple tools don't have an answer to: https://hacks.mozilla.org/2019/04/fluent-1-0-a-localization-...
(Fluent informed much of the design of MessageFormat 2.)
Indeed, if only it were as simple as “{n} rows”.
I18n / l10n is full of things like this, important details that couldn’t be more boring or fiddly to implement.
Which is why Windows UI is littered with language like "number of rows: {n}".
1 reply →
> Indeed, if only it were as simple as “{n} rows”.
How long till we just have a LLM do it on the fly?
Did not gettext have this for decades? https://www.gnu.org/software/gettext/manual/html_node/Plural...
No, gettext scales very badly, both vertically (larger systems) and horizontally (locales with rich grammatical forms like declensions etc.)
We (authors of Fluent and collaborators on MessageFormat 2.0) wrote this explainer which you may find informative - https://github.com/projectfluent/fluent/wiki/Fluent-vs-gette...
Thanks, I'm a decades-long user of gettext from both developer and translator point of view, and have encountered several of the drawbacks to some extent.
It's very good, and has certainly been good enough for most practical purposes, but innovation needs to happen, and things can certainly get better. Thanks for your work in this direction!
Gettext has everything, it just takes knowing five languages to understand what to use for
Yeah, some sort of pluralization support is pretty much the second most important feature in any message localization tool, right after the ability to substitute externally-defined strings in the first place. Even in a monolingual application, spamming plural formatting logic in application code isn't exactly the best practice.
gettext have everything, plus a huge ecosystem like tools to coordinate collaboration from thousand of contributors etc.
if alternatives don't start with a very strong case why gettext wasn't a good option, it's already a good indicator of not-invented-here syndrome.
It's not hard to make a case against gettext, despite its maturity and large ecosystem.
IMHO pluralization is a prime example, with an API that only cleanly handles the English case, requires the developer to be aware of translation gotchas, and honnestly confusing documentation and format. Compare that to MessageFormat's pluralization example (https://github.com/unicode-org/message-format-wg/blob/main/s...) which is very easy to understand and fully in the translator's hands.
2 replies →
This reminds me of https://perldoc.perl.org/Locale::Maketext::TPJ13
Seems like to get it right for every use case / language, you would need functions to translate phrases - so switch statements may be a valid solution. The number of text elements needed for pagination, CRUD operations and similiar UI elements should be finite :)
I checked the spec and don't get that really. Something should specify the formula for choosing the correct form (ie 1 for 21 in Slavic languages) and the format isnt any better compared to the gettext of 30 years ago
This confused me too but the formula and rules for variants are specified by the configured language out-of-band, so there is support for this.
Let's take your example. In English, counting files looks like this:
In Polish, there are several possible variants depending on the count:
Your Polish translators would write:
The library (and your translators) know that in Polish, the `few` variant kicks in when `i%10 = 2..4 && i%100 != 12..14`, etc. I think the library just knows these rules for each language as part of the standard. Mozilla says that it was an explicit design goal to put "variant selection logic in the hands of localizers rather than developers"
The point is that it's supported, it simplifies developer logic, and your translators know how to work with it.
See https://www.unicode.org/cldr/charts/48/supplemental/language...
(Apologies if I got the above translation strings wrong, I don't speak Polish. Just working from the GNU gettext example.)
"the library just knows these rules for each language as part of the standard" sounds great until you try to support a small minority language that the library just doesn't know about and then you're left trying to hack around it by pretending that it's actually a regional variety of another language with similar plural rules.
AFAIK, unlike gettext, MessageFormat doesn't allow you to specify a formula for the plural forms as part of the localization data, so the variant selection logic ended up in the hands of library developers rather than localizers or application developers.
And the standard does get updated occasionally, which can also lead to bugs with localization data written against another version of the standard: https://github.com/cakephp/cakephp/issues/18740
>This confused me too but the formula and rules for variants are specified by the configured language out-of-band, so there is support for this.
Well, making out of band sure is one way to do to prevent lazy people from doing eval on plural forms from the po file. I hope the library is actually good then.
usually it is ó instead of o' but otherwise very good :)
that's a lazy feature. dealing with this on the front end is the right thing so you can have rich empty states anyway.