Comment by halJordan
8 hours ago
Qwen isn't directing the forward progress of llms. SOTA llms have been moe since gpt-4. The og 4.
Out of context, but i honestly hate how HN let itself get so far behind the times that this is the sort of inane commentary we get on AI.
I would venture to suggest that to read it as "Qwen made MoEs in toto || first || better than anyone else" is reductive - merely, the # of experts and #s here are quite novel (70b...inferencing only 3b!?!) - I sometimes kick around the same take, but, thought I'd stand up for this. And I know what I'm talking about, I maintain a client that wraps llama.cpp x ~20 models on inference APIs