Comment by sholain

3 months ago

Nobody has access to 'frontier quality models' except Open AI, Anthropic, Google, maybe Grok, maybe Meta etc. aka nobody in China quite yet. And - there are 'layers' of Engineering beyond just model that make quite a big difference. For certain tasks, GPT5 might be beyond all others, same for Claude + Claude.

That said, the fact that they're doing this while knowing that Anthropic could be monitoring implies a degree of either real or arbitrary irreverence: either they were lazy or dumb (unlikely), or it was some ad hoc situation wherein they really just did not care. Some sub-sub-sub team at some entity just 'started doing stuff' without a whole lot of thought.

'State Backed Entities' are very numerous, it's not unreasonable that some of them, somewhere are prompting a few things that are sketchy.

I'm sure there's a lot of this going on everywhere - and this is the one Anthropic chose to highlight for whatever reasons, which could be complicated.

4 comments

sholain

tw1984 3 months ago

> Nobody has access to 'frontier quality models' except Open AI, Anthropic, Google, maybe Grok, maybe Meta etc. aka nobody in China quite yet.

welcome to 2025. Meta doesn't have anything on par with what Chinese got, that is common knowledge. Kimi, GLM, QWen and MiniMax are all frontier models no matter how you judge it. DeepSeek is obviously cooking something big, you need to be totally blind to ignore that.

America's lead in LLM is just weeks, not quarters or years. Arguing that Chinese spy agencies have to rely on American coding agents to do its job is more like a joke.

sholain 3 months ago
Kimi is plausibly near the frontier but definitely not up to GPT5 spec, the rest are definitely not 'frontier models'.
There are objective ways of 'judging' them.
- tw1984 3 months ago
  
  really love your dual standard mate!
  according to the SWE bench results I am looking at, KIMI K2 has higher agentic coding score than Gemini and its gap with Claude Haiku 4.5 is just 71.3% vs 73.3%, that 2% difference is actually less than the 3% gap between GPT 5.1 (76.3%) vs Claude Haiku 4.5. interestingly, Gemini and Claude Haiku 4.5 are "frontier" according to you but KIMI K2, which actually has the higest HLE nd Live Codebench results, is just "near" the frontier.
  
  1 reply →