Comment by doctoboggan

10 hours ago

Is it honestly better than Opus 4.6 or just benchmaxxed? Have you done any coding with an agent harness using it?

If its coding abilities are better than Claude Code with Opus 4.6 then I will definitely be switching to this model.

9 comments

doctoboggan

Apparently glm5.1 and qwen coder latest is as good as opus 4.6 on benchmarks. So I tried both seriously for a week (glm Pro using CC) and qwen using qwen companion. Thought I could save $80 a month. Unfortunately after 2 days I had switched back to Max. The speed (slower on both although qwen is much faster) and errors (stupid layout mistakes, inserting 2 footers then refusing to remove one, not seeing obvious problems in screenshots & major f-ups of functionality), not being able to view URLs properly, etc. I'll give deepseek a go but I suspect it will be similar. The model is only half the story. Also been testing gpt5.4 with codex and it is very almost as good as CC... better on long running tasks running in background. Not keen on ChatGPT codex 'personality' so will stick to CC for the most part.

madagang 10 hours ago

Their Chinese announcement says that, based on internal employee testing, it is not as good as Opus 4.6 Thinking, but is slightly better than Opus 4.6 without Thinking enabled.

mchusma 10 hours ago

I appreciate this, makes me trust it more than benchmarks.
ibic 8 hours ago

In case people wonder where the announcement is (you can easily translate it via browser if you don't read Chinese): https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg
It's still a "preview" version atm.
anentropic 3 hours ago

Who uses Opus without thinking though...?
deaux 9 hours ago
That's super interesting, isn't Deepseek in China banned from using Anthropic models? Yet here they're comparing it in terms of internal employee testing.
- computably 5 hours ago
  
  > That's super interesting, isn't Deepseek in China banned from using Anthropic models? Yet here they're comparing it in terms of internal employee testing.
  I don't see why Deepseek would care to respect Anthropic's ToS, even if just to pretend. It's not like Anthropic could file and win a lawsuit in China, nor would the US likely ban Deepseek. And even if the US gov would've considered it, Anthropic is on their shitlist.
- renticulous 8 hours ago
  
  They use VPN to access. Even Google Deepmind uses Anthropic. There was a fight within Google as to why only DeepMind is allowed to Claude while rest of the Google can't.