Comment by YetAnotherNick
3 months ago
They compared with Llama 3.1 and found that to be better on average for their tasks like European MMLU. And Llama 3.1 is the worst in the batch with Qwen 2.5 and Gemma 3 being significantly better.
3 months ago
They compared with Llama 3.1 and found that to be better on average for their tasks like European MMLU. And Llama 3.1 is the worst in the batch with Qwen 2.5 and Gemma 3 being significantly better.
No comments yet
Contribute on Hacker News ↗