Comment by ainch

14 days ago

It's a good paper by Hooker but that specific comparison is shoddy. Llama and Aya were both trained by significantly more competent labs on different datasets to Falcon and BLOOM. The takeaway there is "it doesn't matter if you have loads of parameters if you don't know what you're doing."

If we compare apples-to-apples, eg. across Claude models, the larger Opus still happily outperforms it's smaller counterparts.