Comment by jug

1 day ago

> The Qwen3-Next-80B-A3B-Instruct performs comparably to our flagship model Qwen3-235B-A22B-Instruct-2507

I'm skeptical about these claims. How can this be? Wouldn't there be massive loss of world knowledge? I'm particularly skeptical because a recent trend in Q2 2025 has been benchmaxxing.

> I'm skeptical about these claims. How can this be?

More efficient architecture.

> Wouldn't there be massive loss of world knowledge?

If you assume equally efficient architecture and no other salient differences, yes, that’s what you’d expect from a smaller model.

  • Hmm. Let's just say if this is true, that this is actually better with such a much lower total parameter count, it's the greatest accomplishment in over a year of LLM development. With the backdrop of bechmaxxing in 2025, I'll believe in this when I see the results on closed benchmarks and SimpleBench. My concern is this might be a hallucination machine.