← Back to context

Comment by SwellJoe

7 hours ago

The good news (for you and most everyone other than the current leading AI companies), the gap between the SOTA and the near-frontiers is getting smaller every week or two. The leading Chinese models are only a few months behind now (GLM 5.2 tickles the tail of GPT 5.3 or 5.4 and Opus 4.6, according to benchmarks and the vibes among heavy users who've spent some time with it), where they were a couple of years behind a year ago.

4.6 was released at the beginning of February, so if the Chinese models only "tickle its tail," that means they're >5 months behind.

  • That comparison is also misleading because Opus 4.6 was probably not Anthropic's frontier model.

    We got the first news about Mythos in March, so it is likely that it was already close to ready by the time Opus 4.6 was released.

    So the actual gap is the time elapsed between March (or April for the official announcement) and whenever Chinese models can match Mythos.

    • The post-training process of a model that size is months, though it "works" before that. It is a big chunky model before it's released to the world and probably does some amazing things, sometimes...but, it wasn't done (else why wouldn't they release it and soundly trounce their competitors). I would assume that Chinese AI companies have a pipeline and what we see is a couple/few months behind their newest model, as well. Like, the new base model is cooked, but they're still plating it for service.

      Why would Anthropic get the benefit of pre-release models counting toward their lead, if nobody else gets to count their pre-release models?

  • > The leading Chinese models are only a few months behind now

    • I hear that often, but what does that even mean? I am a great proponent of open weights models. I do believe they are the only reason we have not stagnated into a collusion of halting (public) model releases.

      But exactly which point in time is z.ai compared to claude.ai? Consistently bring "6 months behind" in an exponentially acellerating evolution means the gap is growing exponentially wider, not constant.

This is nonsense.

The gap between Chinese models and American frontier models is estimated at 10 months by Anthropic themselves, and it's growing.

China has no flywheel for long-form agentic traces like Claude Code and its telemetry over its userbase (no one uses the Chinese harnesses yet). Most Chinese models are forced to price themselves significantly below cost to compete with the huge demand for bootleg claude tokens, because they're that much worse.

  • > is estimated at 10 months by Anthropic themselves, and it's growing.

    How is this different than any business with something to lose saying a competitor isn't as good? Not saying it's false, but it would seem to me that it's more important how customers feel about the issue.

  • > The gap between Chinese models and American frontier models is estimated at 10 months by Anthropic themselves, and it's growing.

    There's a lot of subjectivity in determining this, but I'm 100% sure that 10 months is wrong.

    I don't know whether the gap is currently growing, but I'm not sure it matters. There are thresholds where models reach certain levels of usefulness. Opus 4.8, for example, is at a level where I can give it relatively vague input, and it can go for half an hour on its own and produce a high-quality PR.

    If GLM reaches that level of capability and can do that task more cheaply than Anthropic's model, I will use GLM for that task, because that's a specific type of task I use models for. It doesn't really matter whether Anthropic also has a better model, because what does "better" mean in this context? It's a clearly defined task, and Opus 4.8 already does it at a very high level of quality.

  • Ah, well, if Anthropic says their competitors are ten months behind...

    I don't know what I was thinking.

  • Here in Australia the sudden withdrawal of Fable made all of us think hard about models and harnesses.

    I've heard half a dozen people talk about how a less advanced model coupled with a better harness outperforms a smarter model in the last few weeks.

    If the USA wanted to shoot its AI industry in the foot it achieved its goal.

  • If Anthropic themselves say competition is 10 months behind, it's probably 5 or less.

    And you seem to think "no one uses" DeepSeek's v4, z.AI's GLM 5.2 or Xiaomi's MiMo 2.5 from their official APIs when they probably dwarf Anthropic's usage and are widening the gap due to conquering a chunk of Western market too.

    I know it's hard for some to comprehend there's an entire Eastern hemisphere in the globe with billions of people, so it's worth reminding. And some seem to think the world is basically silicon valley even.

    • Because claude subscription tokens are cheaper than deepseek and friends. You have whole industry of people reselling Claude subscriptions in China.

      Can you comprehend than Anthropic is winning because is both cheap(subscriptions) and better SOTA. People are cheering China providers when I reality they would rugpull open weights the moment they are competive.

      China models are trash that why they are giving them away for free.

      For individuals and small companies subscriptions is the best deal, for big companies china models are big no unless they can host them.