Comment by moffkalast
18 days ago
Ok I see we're very far from being on the same page.
Multilingualism in context of language models means something more than English, because that's what every model trained on the internet already knows. There aren't any I'm aware of that don't, since it would be exceedingly hard to exclude it from the dataset even if you wanted to for some reason. This is like the "what about men's rights" when talking about women's rights... yes we know, they're already entirely ubiquitous.
But more properly I would consider LLM multilingualism straight up knowing all languages. We benchmark models on the MMLU and similar collections that contain all fields of knowledge known to man, so I would say it's reasonable to expect fluency of all languages as well.
No comments yet
Contribute on Hacker News ↗