The Chinese open weight models have been ahead of Sonnet (at least for coding) for a couple months now. I tend to take benchmarks with a huge grain of salt, but in my own experience, the latest versions of Kimi, MiMo, and GLM (pre-5.2) had already surpassed Sonnet in terms of output quality for a fraction of the price.
With that said, I'm excited to try GLM 5.2 because I still end up reaching for Opus and GPT 5.5 for many tasks because the open models tend to get stuck more often on complex problems.
Do you have benchmarks or at least anecdotes to back that up? I'm not arguing with you; I would just love to see some proof that open models are getting as good as Anthropic's models.
look at benchmarks, use the model yourself. Im usually first to call BS on every chinese model that says they are as good as Opus. this is finally the first one that actually is. It is a massive jump from every other previous chinese model.
The Chinese open weight models have been ahead of Sonnet (at least for coding) for a couple months now. I tend to take benchmarks with a huge grain of salt, but in my own experience, the latest versions of Kimi, MiMo, and GLM (pre-5.2) had already surpassed Sonnet in terms of output quality for a fraction of the price.
With that said, I'm excited to try GLM 5.2 because I still end up reaching for Opus and GPT 5.5 for many tasks because the open models tend to get stuck more often on complex problems.
I found sonnet preferable to k2.6 but 2.7 code for kimi seems better anecdotally
Definitely opus level for coding.
Do you have benchmarks or at least anecdotes to back that up? I'm not arguing with you; I would just love to see some proof that open models are getting as good as Anthropic's models.
I've been running some test prompts comparing frontier models for webdev, particularly pretty visualizations, physics / orbital simulations, etc.
Do note that GLM is not multi modal, which can be a deal breaker. And these open models are not good outside coding.
look at benchmarks, use the model yourself. Im usually first to call BS on every chinese model that says they are as good as Opus. this is finally the first one that actually is. It is a massive jump from every other previous chinese model.
1 reply →
Oic I misremembered OAI scores, I thought Sonnet had 51