Comment by miros_love
5 days ago
>European versions of ARC
But this is an image-like benchmark. Has anyone looked at the article about the EU-ARC, what is the difference? Why can't you measure it on a regular one?
I glanced through it, didn't find it right away, but judging by their tokenizer, they are learning from scratch. In general, I don't like this approach for the task at hand. For large languages, there are already good models that they don't want to compare with. And for low-resource languages, it is very important to take more languages from this language group, which are not necessarily part of the EU
You might be confusing ARC-AGI and EU-ARC which is a language benchmark [1]
[1] https://arxiv.org/pdf/2410.08928
Why would they want more languages from outside of the EU when they've clearly stated they only target the 24 official languages of the European Union?
For example: Slovene language. You simply don't have enough data on it. But if you add all the data that is available on related languages, you will get a higher quality. LLM fails with this property for low-resource languages.
I'm not sure I'm convinced. I speak a small European language and the general experience is that LLMs are often wrong exactly because they think they can just borrow from a related language. The result is even worse and often makes no sense whatsoever. In other words, as far as translations go, confidently incorrect is not useful.
They train on 14 billion tokens in Slovene. Are you sure that's not enough?
2 replies →