Comment by jp1016

10 days ago

One practical thing I appreciated about MessageFormat is how it eliminates a bunch of conditional UI logic.

I used to write switch/if blocks for:

• 0 rows → “No results” • 1 row → “1 result” • n rows → “{n} results”

Which seems trivial in English, but gets messy once you support languages with multiple plural categories.

I wasn’t really aware of how nuanced plural rules are until I dug into ICU. The syntax looked intimidating at first, but it actually removes a lot of branching from application code.

I’ve been using an online ICU message editor (https://intlpull.com/tools/icu-message-editor) to experiment with plural/select cases and different locales helped me understand edge cases much faster than reading the spec alone.

22 comments

jp1016

Vinnl 10 days ago

This post shows a lot of the challenges with localisation, that many seemingly simple tools don't have an answer to: https://hacks.mozilla.org/2019/04/fluent-1-0-a-localization-...

(Fluent informed much of the design of MessageFormat 2.)

draw_down 10 days ago
Indeed, if only it were as simple as “{n} rows”.
I18n / l10n is full of things like this, important details that couldn’t be more boring or fiddly to implement.
- Joker_vD 10 days ago
  
  Which is why Windows UI is littered with language like "number of rows: {n}".
  
  1 reply →
- magicalhippo 10 days ago
  
  > Indeed, if only it were as simple as “{n} rows”.
  How long till we just have a LLM do it on the fly?

pferde 10 days ago

Did not gettext have this for decades? https://www.gnu.org/software/gettext/manual/html_node/Plural...

zbraniecki 10 days ago
No, gettext scales very badly, both vertically (larger systems) and horizontally (locales with rich grammatical forms like declensions etc.)
We (authors of Fluent and collaborators on MessageFormat 2.0) wrote this explainer which you may find informative - https://github.com/projectfluent/fluent/wiki/Fluent-vs-gette...
- pferde 9 days ago
  
  Thanks, I'm a decades-long user of gettext from both developer and translator point of view, and have encountered several of the drawbacks to some extent.
  It's very good, and has certainly been good enough for most practical purposes, but innovation needs to happen, and things can certainly get better. Thanks for your work in this direction!
Muromec 10 days ago

Gettext has everything, it just takes knowing five languages to understand what to use for
Sharlin 10 days ago

Yeah, some sort of pluralization support is pretty much the second most important feature in any message localization tool, right after the ability to substitute externally-defined strings in the first place. Even in a monolingual application, spamming plural formatting logic in application code isn't exactly the best practice.
iririririr 10 days ago
gettext have everything, plus a huge ecosystem like tools to coordinate collaboration from thousand of contributors etc.
if alternatives don't start with a very strong case why gettext wasn't a good option, it's already a good indicator of not-invented-here syndrome.
- moltonel 10 days ago
  
  It's not hard to make a case against gettext, despite its maturity and large ecosystem.
  IMHO pluralization is a prime example, with an API that only cleanly handles the English case, requires the developer to be aware of translation gotchas, and honnestly confusing documentation and format. Compare that to MessageFormat's pluralization example (https://github.com/unicode-org/message-format-wg/blob/main/s...) which is very easy to understand and fully in the translator's hands.
  
  2 replies →

chokma 10 days ago

This reminds me of https://perldoc.perl.org/Locale::Maketext::TPJ13

Seems like to get it right for every use case / language, you would need functions to translate phrases - so switch statements may be a valid solution. The number of text elements needed for pagination, CRUD operations and similiar UI elements should be finite :)

Muromec 10 days ago

I checked the spec and don't get that really. Something should specify the formula for choosing the correct form (ie 1 for 21 in Slavic languages) and the format isnt any better compared to the gettext of 30 years ago

gcr 10 days ago
This confused me too but the formula and rules for variants are specified by the configured language out-of-band, so there is support for this.
Let's take your example. In English, counting files looks like this:
You have {file_count, plural, =0 {no files} one {1 file} other {# files} }
In Polish, there are several possible variants depending on the count:
Masz 1 plik Masz 2,3,4 pliki Masz 5-21 pliko'w Masz 22-24 pliki Masz 25-31 pliko'w
Your Polish translators would write:
Masz {file_count, plural, one {# plik} few {# pliki} other {# pliko'w} }
The library (and your translators) know that in Polish, the `few` variant kicks in when `i%10 = 2..4 && i%100 != 12..14`, etc. I think the library just knows these rules for each language as part of the standard. Mozilla says that it was an explicit design goal to put "variant selection logic in the hands of localizers rather than developers"
The point is that it's supported, it simplifies developer logic, and your translators know how to work with it.
See https://www.unicode.org/cldr/charts/48/supplemental/language...
(Apologies if I got the above translation strings wrong, I don't speak Polish. Just working from the GNU gettext example.)
- yorwba 10 days ago
  
  "the library just knows these rules for each language as part of the standard" sounds great until you try to support a small minority language that the library just doesn't know about and then you're left trying to hack around it by pretending that it's actually a regional variety of another language with similar plural rules.
  AFAIK, unlike gettext, MessageFormat doesn't allow you to specify a formula for the plural forms as part of the localization data, so the variant selection logic ended up in the hands of library developers rather than localizers or application developers.
  And the standard does get updated occasionally, which can also lead to bugs with localization data written against another version of the standard: https://github.com/cakephp/cakephp/issues/18740
- Muromec 10 days ago
  
  >This confused me too but the formula and rules for variants are specified by the configured language out-of-band, so there is support for this.
  Well, making out of band sure is one way to do to prevent lazy people from doing eval on plural forms from the po file. I hope the library is actually good then.
- npodbielski 10 days ago
  
  usually it is ó instead of o' but otherwise very good :)

iririririr 10 days ago

that's a lazy feature. dealing with this on the front end is the right thing so you can have rich empty states anyway.