Comment by dragonwriter
10 hours ago
> I'm skeptical about these claims. How can this be?
More efficient architecture.
> Wouldn't there be massive loss of world knowledge?
If you assume equally efficient architecture and no other salient differences, yes, that’s what you’d expect from a smaller model.
Hmm. Let's just say if this is true, that this is actually better with such a much lower total parameter count, it's the greatest accomplishment in over a year of LLM development. With the backdrop of bechmaxxing in 2025, I'll believe in this when I see the results on closed benchmarks and SimpleBench. My concern is this might be a hallucination machine.
In my testing this model is quite bad and far behind 235b a22b. https://fiction.live/stories/Fiction-liveBench-Sept-12-2025/...
Might be. FWIW, my experience with the Qwen3 30b model basically took ChatGPT out of rotation for me. It's not hard for me to imagine an 80b model pushing that further, especially with thinking enabled.
I recommend playing with the free hosted models to draw your own conclusions: https://chat.qwen.ai/