← Back to context

Comment by Reubend

1 day ago

Seems like there's no official blog post with benchmark results yet. But I'm once again thankful for the Chinese AI labs for being open with their work and contributing it to the world under permissive licenses like this. The Fable 5 fiasco is just another reminder of how valuable these things are to have.

Based on my first impressions it's about 6 months behind the frontier labs. So very similar to Opus in January.

That is, pretty damn impressive and very useable. When it comes to architecture or complex problems it does noticeable worse but I don't think anyone expected anything else.

One particular interesting strong point seems to be design and user interfaces. It does seem to punch above it's weight there but that might just be personal preference.

  • Opus in January was right about when AI became actually useful for coding for me. So if that’s the case, that is absolutely great.

  • > When it comes to architecture or complex problems it does noticeable worse but I don't think anyone expected anything else.

    So it's not really similar to opus in January?

  • > Opus in January

    So pre-nerf Opus?

    • Was going to say, I don't think Opus has really got much better in the last 6mo.

      It just goes in cycles of being better and then being worse again, presumably based on how much Anthropic are having to optimise inference

  • Appreciate the quick take! Sounds like a keeper to me. I think the Opus and Fable design (that I saw for a short while) have gotten stale

    • > I think the Opus and Fable design (that I saw for a short while) have gotten stale

      Can you expand on what you mean by stale? I don't get how an artefact-producer can get "stale" besides literally out-of-data information which I dont think you mean because you mention fable.

      2 replies →

I just ran a report from a project I'm working on that uses a mix of models, and GLM 5.1 trumped Sonnet over the last week, so I'm excited to now turn on 5.2. This is based on completion only - not quality, but that includes passing a huge test suite, and Sonnets failure rate was surprisingly bad...

What I've seen from 5.1 for things like planning has certainly not read as impressive as Opus, and often even as Sonnet, but it's been a strong and steady work-horse that's just kept on actually delivering progress.

It's also a reminder that as soon as Chinese models take the lead, they will switch to closed source too... so let's not be complacent, we need stronger, completely open data models, open source code, etc. to mitigate this risk

  • Based on what? Do you have real proof on it or is it just a guess that Chinese companies aren’t better than American ones?

    • Chinese companies are literally the state of China.

      So the question is "How much do I trust Xi Jinpeng (or whoever is the chosen successor)?"

      American companies will compromise and work with the government diplomatically. Chinese companies are the government.

      Its a key distinction many fail to grasp, and hard to when you are lost in the sauce of constant American political infighting.

    • It's neither the American nor Chinese LABS I'm weary of, it's their government, both very prone to interference "in the name of national security"

  • How do you figure that? “also a reminder that as soon as Chinese models take the lead, they will switch to closed source too”

    What specifically about their release strategy “reminded” you of that conjecture?

    The premise that they only open source the models … because it somehow helps them leapfrog American labs, and once they actually can leapfrog them, they’d close source them, doesn’t really track for me. Am I missing something?

    I mean I think we need our own domestic open weight labs. I just don’t particularly understand the point you’re making

    • The point I’m making is that this has become a strategic resource. The Chinese government allows wide sharing of their models because is weakens the US position.

      If Chinese models become better than Americans, do you believe the CCP will allow the free distribution of their flagship models?

      Think again if it’s the case.

      19 replies →