Comment by Reubend

1 day ago

Seems like there's no official blog post with benchmark results yet. But I'm once again thankful for the Chinese AI labs for being open with their work and contributing it to the world under permissive licenses like this. The Fable 5 fiasco is just another reminder of how valuable these things are to have.

41 comments

Reubend

LaurensBER 1 day ago

Based on my first impressions it's about 6 months behind the frontier labs. So very similar to Opus in January.

That is, pretty damn impressive and very useable. When it comes to architecture or complex problems it does noticeable worse but I don't think anyone expected anything else.

One particular interesting strong point seems to be design and user interfaces. It does seem to punch above it's weight there but that might just be personal preference.

pastel8739 15 hours ago

Opus in January was right about when AI became actually useful for coding for me. So if that’s the case, that is absolutely great.
jstummbillig 7 hours ago

> When it comes to architecture or complex problems it does noticeable worse but I don't think anyone expected anything else.
So it's not really similar to opus in January?
byw 17 hours ago
> Opus in January
So pre-nerf Opus?
- ifwinterco 10 hours ago
  
  Was going to say, I don't think Opus has really got much better in the last 6mo.
  It just goes in cycles of being better and then being worse again, presumably based on how much Anthropic are having to optimise inference
becomevocal 18 hours ago
Appreciate the quick take! Sounds like a keeper to me. I think the Opus and Fable design (that I saw for a short while) have gotten stale
- GCUMstlyHarmls 16 hours ago
  
  > I think the Opus and Fable design (that I saw for a short while) have gotten stale
  Can you expand on what you mean by stale? I don't get how an artefact-producer can get "stale" besides literally out-of-data information which I dont think you mean because you mention fable.
  
  2 replies →
Lord-Jobo 18 hours ago

It’s insanely impressive and I’m so glad that the space has actual competition
ignoramous 13 hours ago
> Based on my first impressions it's about 6 months behind the frontier labs. So very similar to Opus in January.
According to this one benchmark, I find it amusing that Qwen3.6 27B beats ALL "frontier lab" models on coding Kotlin: https://archive.vn/RYBCL / https://gertlabs.com/rankings?mode=agentic_coding&language=k...
- ThouYS 11 hours ago
  
  3.6 is an absolute beast! makes you wonder why the big heavy models are even needed?!

vidarh 11 hours ago

I just ran a report from a project I'm working on that uses a mix of models, and GLM 5.1 trumped Sonnet over the last week, so I'm excited to now turn on 5.2. This is based on completion only - not quality, but that includes passing a huge test suite, and Sonnets failure rate was surprisingly bad...

What I've seen from 5.1 for things like planning has certainly not read as impressive as Opus, and often even as Sonnet, but it's been a strong and steady work-horse that's just kept on actually delivering progress.

khalic 11 hours ago

It's also a reminder that as soon as Chinese models take the lead, they will switch to closed source too... so let's not be complacent, we need stronger, completely open data models, open source code, etc. to mitigate this risk

victorbjorklund 8 hours ago
Based on what? Do you have real proof on it or is it just a guess that Chinese companies aren’t better than American ones?
- WarmWash 6 hours ago
  
  Chinese companies are literally the state of China.
  So the question is "How much do I trust Xi Jinpeng (or whoever is the chosen successor)?"
  American companies will compromise and work with the government diplomatically. Chinese companies are the government.
  Its a key distinction many fail to grasp, and hard to when you are lost in the sauce of constant American political infighting.
- khalic 8 hours ago
  
  It's neither the American nor Chinese LABS I'm weary of, it's their government, both very prone to interference "in the name of national security"
cududa 10 hours ago
How do you figure that? “also a reminder that as soon as Chinese models take the lead, they will switch to closed source too”
What specifically about their release strategy “reminded” you of that conjecture?
The premise that they only open source the models … because it somehow helps them leapfrog American labs, and once they actually can leapfrog them, they’d close source them, doesn’t really track for me. Am I missing something?
I mean I think we need our own domestic open weight labs. I just don’t particularly understand the point you’re making
- khalic 10 hours ago
  
  The point I’m making is that this has become a strategic resource. The Chinese government allows wide sharing of their models because is weakens the US position.
  If Chinese models become better than Americans, do you believe the CCP will allow the free distribution of their flagship models?
  Think again if it’s the case.
  
  19 replies →
refabricator 5 hours ago

[dead]

Eridrus 17 hours ago

Releasing a model without benchmarks seems to say the model is probably bad...