Comment by mesmertech

1 day ago

For coding you always want to go with the best model in the category, not something that would be the best model if we went 1 year back which GLM 5.1 is, and I'm saying that as a big fan of GLM cause I run a translation site where GLM is good enough for the price.

Most of the money right now is in coding. Openai and Anthropic just have to be 6 months ahead of SOTA open source models and they'll capture most of the enterprise and dev market

57 comments

mesmertech

r0b05 7 minutes ago

Why do need to go with the best model for coding?

binary0010 1 day ago

Yes I'm an engineer (20 years most in games/graphics industry) and only use it for code. I've been using glm 5.1 this week a lot. I went in expecting another "decent" but not really "up to standard" open source model.

I highly doubt I'll ever use Claude again.

I think you are wrong about Claude being any significant level better

cassianoleal 1 day ago
I've been mostly coding with GLM-5.1 as well and I agree with you. DeepSeek V4 Flash is another very good surprise. Incredibly cheap, fast and effective.
- MaKey 7 hours ago
  
  I've been using DeepSeek v4 Flash with OpenCode for the whole week to refactor a Terraform code base I inherited and it worked surprisingly well.
aspenmartin 18 hours ago
Well I think there are a multitude of harder measurements that would disagree with you, but ultimately there is absolutely a use case for cheaper open models (or even cheaper tiers of proprietary models) and in fact the unsolved optimization everyone is trying to get to is how much spend to use for a given task. But there will always be a market, especially in enterprise, for the best performance there is to offer
- ggttk 15 hours ago
  
  Why are you boosting so hard? Lmao either you’re a paid poster or you own stock in a frontier firm. Which one?
  
  1 reply →

RevEng 15 hours ago

I strongly disagree. I'm an engineer - I'm all about the fastest, cheapest thing that meets the requirements. I don't need Opus 4.7, even for my complex programming tasks. It costs over 10x other models available that still give good enough answers. Those smaller models are also a lot faster to output tokens, which saves me time.

Once the model gets good enough, the returns on bigger models diminishes quickly. I don't want to spend 10x the money and wait 5x the time to get answers that are equivalent.

yokoprime 12 hours ago

Same here, i can't say i've seen any difference in 4.6 vs 4.7 other than price

odie5533 1 day ago

If I generate code with Claude, ChatGPT, and GLM 5.1, I can't say which model is which reliably. I exclusively use Claude more out of superstition than reason.

lunar_mycroft 20 hours ago

> For coding you always want to go with the best model in the category

This is transparently false, because the best "model" is still competent human developers. They're just more expensive. If you're willing to use current LLMs at all, it means you're willing to sacrifice quality for a better price, and your disagreement with the comment you were replying to is entirely about what the optimum tradeoff is.

aspenmartin 18 hours ago
Well it may be false that you always want the best model, but the point is performance of you+<agent> is far more cost effective than you+someone else
- lunar_mycroft 6 hours ago
  
  Maybe, but that's a different claim than the one I was responding to. And also raises the question of "if the lower quality but cheaper output of frontier models is more cost effective than humans, is the even lower quality but even cheaper output of OSS models is more cost effective still?" With an absolute rule like GP suggested ("no, you always want the best code generator") the answer is clear, but it get much murkier if you reject such rules (as you have to to be an LLM coding proponent)
  
  1 reply →
noname120 11 hours ago
It was true 6 months ago, not anymore. Frontier models now outperform developers on many tasks, be it on quality/readability/maintainability, and let’s not talk about speed…
- lunar_mycroft 6 hours ago
  
  I've seen the code they produce without extensive help from human developers, this is clearly false.
  Good to see the classic "yeah the models weren't good enough six months ago, but this time they actually are, promise! Please forget you were hearing the exact same thing six months ago!" is alive and well though.
  
  5 replies →
- suddenlybananas 10 hours ago
  
  Why is anthropic hiring software developers then?
  
  8 replies →

eikenberry 1 day ago

> For coding you always want to go with the best model in the category [..]

And this is why many companies go out of business. You always want the best bang for your buck, sometimes this is the "best model" and sometimes it is not.

kgwgk 1 day ago

For coding like for everything else in life cost is a factor.

mesmertech 1 day ago
Cost for the value delivered. Like if you offered the current SOTA open source models at $0.1/M, I still think I'd be using Opus or 5.5 at $30/M. Or say GPT 5 which was released Aug 25, I don't think I'd use it for coding for even $0.1. I'd def find other uses for it(translations, agentic workflows, prompt guards etc), but for coding I don't think I'd ever completely switch to a SOTA open model
Unless ofc there was an actual speed difference, only reason I'd be willing to go with a worse model couple of percent worse than current best model is if the speed was at least 5x higher. Looking forward to kimi k2.6 offered publicly by Cerebras
- kgwgk 1 day ago
  
  > I still think I'd be using
  That's fine. Other people may not want to pay 300x more and will rather make do with last year's SOTA.
  > For coding you always want to go with the best model
  Maybe you meant "For coding I always want to go with the best model"?
  
  3 replies →

vidarh 6 hours ago

I have stats from a harness that tells me glm5.1 is far more cost effective for us than Opus with the rate of defects and rework taken into account. In fact, with a decent harness I'm now increasingly favouring eHaiku over Opus for execution too. Opus is still worth it for planning, though, and far better at one-shotting things.

danny_codes 13 hours ago

Why? If it's good enough, it's good enough. Though I read the code that gets vibed so maybe my use-case is different.

yokoprime 12 hours ago

It's driven a lot by the harness too. If you're using claude code, you're actively being pushed towards newer models, even though older ones work perfectly fine for your use cases

solomatov 19 hours ago

>For coding you always want to go with the best model in the category, not something that would be the best model if we went 1 year back which GLM 5.1 is, and I'm saying that as a big fan of GLM cause I run a translation site where GLM is good enough for the price.

Currently, the difference is substantial, but what happens if capabilities saturate?

aspenmartin 18 hours ago
Then the house of cards comes crumbling down, but there is so much evidence to point to this not happening that it requires a bit of a theory for how that may happen
- solomatov 18 hours ago
  
  > but there is so much evidence to point to this not happening
  Could you explain this?
  
  1 reply →

Perz1val 10 hours ago

And you propose the same companies that have been cost cutting and avoiding buying you a chair for ever won't start objecting to a $200/dev/month subscription? The finance department won't have a say?

Andrex 1 day ago

> For coding you always want to go with the best model in the category

Will this always be true? There will never be an event horizon/point of diminishing returns where something not-bleeding-edge is "good enough" for 51%+ of users?

mesmertech 21 hours ago

As long as closed source is 6 months ahead in terms of current difference. Although this is hard to figure out using simple percent based coding benchmarks, you def. notice it when you're actually trying to do a long task. Even simple things like UI "taste" is enough for me to use opus instead of 5.5 though even though 5.5 is strictly better for anything that doesn't have a UI, ie backend, scripts, making agent workflows etc

blackjack_ 1 day ago

This is a silly take. There is a line of "good enough" for most coding (most CRUD apps and APIs are nothing special), and once we are past that, nobody will care about having the "newest, best" model except extreme outliers. And this base "good enough" model will become an ultra cheap commodity as we already see with GLM, deepseek, etc.

mesmertech 21 hours ago

As long as closed models are 6 months ahead I won't be switching from them to prev. 6 month SOTA open source models. Maybe its just a different calculation if you're in a job, but as an indiehacker I'll take any edge I can get
Ofc again, can be convinced to switch if there's however a clear speed difference, like 5x+ for a open source sota even if it was SOTA for 6 months ago

dogleash 1 day ago

> For XXX you always want to go with XXX, not XXX

Oh, hey, I recognize you. Thank you for the very forward and thorough orbital sander recommendation at Home Depot. That's exactly what I wanted to deal with on my holiday weekend. You just know so much about this and the rest of us are simple passersbys.

mesmertech 20 hours ago
Yep sorry was just pulling it out my rear, not like a market trend that nearly every enterprise uses Anthropic or Openai models for coding or that Anthropic has had such ridiculous growth that they're 10x-ing year over year
- dogleash 3 hours ago
  
  I'm ribbing you for writing like a condescending guru that invalidates the evaluatory capability of your peers. Not the meat of your evaluation (not to say that it's any good either, just that it's irrelevant).

EGreg 1 day ago

Most work is not coding.

And also, people have it wrong… their models are not the main problem anymore. It’s the RAG

tomrod 1 day ago
Would love to hear more about your thought about the RAG.
- simonw 1 day ago
  
  I think RAG is a mostly outdated concept now, it's been subsumed by the idea of a "agent harness" which is exactly what Claude Code and Claude Cowork and OpenAI Codex and Claude.ai and ChatGPT themselves have now become.
  An agent harness with access to a good search tool is a much more interesting thing than 2024-era RAG systems.
obsidianbases1 1 day ago

Depending on RAG is a workflow problem, not an AI problem