Comment by throwaw12

12 hours ago

Aghhh, I wished they release a model which outperforms Opus 4.5 in agentic coding in my earlier comments, seems I should wait more. But I am hopeful

By the time they release something that outperforms Opus 4.5, Opus 5.2 will have been released which will probably be the new state-of-the-art.

But these open weight models are tremendously valuable contributions regardless.

One of the ways the chinese companies are keeping up is by training the models on the outputs of the American fronteir models. I'm not saying they don't innovate in other ways, but this is part of how they caught up quickly. However, it pretty much means they are always going to lag.

  • Not true, for one very simple reason. AI model capabilities are spiky. Chinese models can SFT off American frontier outputs and use them for LLM-as-judge RL as you note, but if they choose to RL on top of that with a different capability than western labs, they'll be better at that thing (while being worse at the things they don't RL on).

  • They are. There is no way to lead unless China has access to as much compute power.

    • They likely will lead in compute power in the medium term future, since they’re definitely the country with the highest energy generation capacity at this point. Now they just need to catch up on the hardware front, which I believe they’ve also made significant progress on over the last few years.

      1 reply →

The Chinese just distill western SOTA models to level up their models, because they are badly compute constrained.

If you were pulling someone much weaker than you behind yourself in a race, they would be right on your heels, but also not really a threat. Unless they can figure out a more efficient way to run before you do.

  • But it is a threat when the performance difference is not worth the cost in the customers' eyes.

There have been a couple "studies" and comparing various frontier-tier AIs that have led to the conclusion that Chinese models are somewhere around 7-9 months behind US models. Other comment says that Opus will be at 5.2 by the time Qwen matches Opus 4.5. It's accurate, and there is some data to show by how much.