Comment by Etheryte
5 days ago
Why would they want more languages from outside of the EU when they've clearly stated they only target the 24 official languages of the European Union?
5 days ago
Why would they want more languages from outside of the EU when they've clearly stated they only target the 24 official languages of the European Union?
For example: Slovene language. You simply don't have enough data on it. But if you add all the data that is available on related languages, you will get a higher quality. LLM fails with this property for low-resource languages.
I'm not sure I'm convinced. I speak a small European language and the general experience is that LLMs are often wrong exactly because they think they can just borrow from a related language. The result is even worse and often makes no sense whatsoever. In other words, as far as translations go, confidently incorrect is not useful.
They train on 14 billion tokens in Slovene. Are you sure that's not enough?
Unfortunately, yes.
We need more tokens, more variety of topics in texts and more complexity.
1 reply →