← Back to context

Comment by cristoperb

14 hours ago

I haven't tried it for anything myself yet. The paper provides several benchmarks. The emphasis during training was on multi-language support (over 1800 languages are represented in its pre-training data, which is 40% non-English) and non-copyrighted training data... and the benchmarks seem to suffer for it.

https://arxiv.org/abs/2509.14233