Comment by _345

16 hours ago

If you're okay with sonnet level performance, this sounds like a straight upgrade. But I find that sonnet messes up too much, that it ends up not being worth cost optimizing down to using it or another sonnet-level model. Glad to have this as an option though

30 comments

_345

2ndorderthought 16 hours ago

A lot of people are having good experiences doing things like using opus for designing and using locally hosted qwen3.6 for implementation.

I could see a serious cost reduction story by using opus for design and deepseek for implementation.

Personally I would avoid anthropic entirely. But I get why people don't.

girvo 16 hours ago
Like me: that’s what I do. Either Opus 4.7 or GLM 5.1 for planning, write it out to a markdown file, then farm it out to Qwen 3.6 27B on my DGX Spark-alike using Pi. Works amusingly well all things considered.
- brianjking 13 hours ago
  
  How are you interacting with GLM 5.1? Via the Claude Code harness? I really wish they'd release a fully multimodal model already.
  
  1 reply →
- 2ndorderthought 16 hours ago
  
  How is glm 5.1? I have t tried it yet but have been meaning too
  
  5 replies →
- aftbit 15 hours ago
  
  What hardware are you using to power this?
  
  5 replies →

chrsw 15 hours ago

I keep re-learning this lesson: I chug along with a lesser model then throw a problem at it that's too complex. Then I try different models until I give up and bring in Opus 4.6 to clean up.

brianwawok 15 hours ago
And I keep using Opus to like, make git commits. Really just need a smart router that is actually smart, vs having to micromanage model
- sterlind 12 hours ago
  
  the problem is managing the contexts. your session might fit in Opus, but will that smaller model you dispatch the git commit to fit? even so, will it eat too much on prefill? do you keep compactions around for this, or RAG before dispatch or something? how do you button back up the response?
  all doable but all vaguely squishy and nuanced problems operationally. kinda like harness design in general.
energy123 12 hours ago
It's not even that much cheaper, GPT 5.5 is about 2x more expensive per task than Deepseek v4 Pro when you adjust for less token usage, according to Artificial Analysis. Doesn't seem worth it to me.
- cpursley 5 hours ago
  
  Are we talking pay as you go API or vs plans?
  
  1 reply →

maxdo 12 hours ago

This is the problem: you need the best model, not just a good one, for: - Good architecture, which requires reading specs, code, etc. reads like: lots of tokens in/out - Bug fixing — same, plus logs, e.g. datadog

Once you've found the path, patches are trivial and the savings are tiny unless you're doing refactoring/cleanup.

testing gets more and more complicated. Take a look at opencode go, and you see this:

>Includes GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo->V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, >DeepSeek V4 Pro, and DeepSeek V4 Flash

and now on your own with bugs, all of these models can produce at scale. Am i missing anything in this picture. What is the real use of cheaper models?

JSR_FDED 9 hours ago

I'd argue that you need the model that's good enough, not the best.

Culonavirus 10 hours ago

We're not yet at a point of saturation when all the frontier models would be of somewhat comparable "intelligence" and we could decide which to use based on other factors (speed, effective context window etc.), so I honestly don't see why would you (as a company or an employee) not use the best available model with the highest (or at least second highest) thinking effort. The fees are not exactly cheap, but not that expensive either.

nyssos 9 hours ago

Agreed that we're not at saturation, but we don't have a canonical "best" either. For example ChatGPT 5.5 + Codex is, in my experience, vastly superior to Opus 4.7 + Claude Code at sufficiently well-specified Haskell, but equally vastly inferior at correctly inferring my intent. Deepseek may well have its own niche, though I haven't used it enough to guess what it might be.

mohsen1 8 hours ago

This has been my experience working on tsz.dev. Only Opus 4.7 and GPT 5.5 can really be productive for the remaining test cases.

willio58 15 hours ago

I don’t find this with sonnet at all. As long as I have a solid Claude.md and periodically review the output and enforce good code practices via basic CI gates I’ve rarely ever found myself having to switch to opus

2ndorderthought 14 hours ago

You might be surprised then at how good cheaper models solve your problems