← Back to context

Comment by codinhood

6 hours ago

I don't think you're going to get many "true" answers to this. The opportunity cost of not using the latest and best models is just too much right now.

Every month I research this and come to the same conclusion: the time, effort, and cost required to get local models (and the coding tools around them) to perform even close to Claude Code with sonnet/opus just not worth it right now. If it was, it would be distributive enough to be in the news.

Not that I'm discounting someone hasn't already solved this, just trying to Occam razor my way out of diving too deep down rabbit holes.

At some point, there will come a saturation point for that "Opportunity cost FOMO train ride", and I think we are already past that point. Mythos class models are a whole different beasts and cutting edge on reasoning but not much use for the problem domains most developers are trying to solve.

The present Sonnet/Opus versions (~4.8) will likely be what everyone in the enterprise might end up using eventually. And even though local models aren't there yet, there are budget alternatives from the families of DeepSeek, Kimi, GPT, MiniMax, etc. available through APIs of NVidida, OpenRouter, Groq, etc. which are very much Sonnet grade.

  • Yeah this is exactly what I'm waiting for.

    Personally, I don't think we're at that point yet. While I do think model improvement is starting to plateau (reaching a local ceiling), I'm not convinced local models are as good as sonnet/opus yet. The gap is still too much. But I'm excited for those models to reach those levels.

Rather than Occam, consider Pareto?

If you truly believe that it WILL get there within the next couple of years, then you might as well start playing with it now (and, yes, you will be very surprised, especially for shorter/smaller projects or nicely modularized larger projects)

Sounds like a correct conclusion to me also. I am trying to transition to a layered system: local, then OpenCode with commercial vendor APIs for models like DeepSeek v4 flash, then DeepSeek v4 Pro.

With a layered approach we can slowly shift to running more locally and still get required work done. Really, my local setup is so much better than it was 2 months ago, and extremely better than 6 months ago - on the same hardware.

This seems to be the answer. Building a rig with a decent graphics card will cost $2k+ and will produce sub-par results. Might as well milk the $100/m Claude sub until open-source alternatives reach parity with today's frontier models.

But you're pretty much measuring opportunity cost in tokens per second, no?

I think it strongly remains to be seen whether e.g. tokens per second (multiplied or whatever by percieved quality of private model) actually means "better or more useful output."

I strongly suspect it does not. (though I also strongly suspect this will be very difficult to measure because the incentive to lie about metrics here will be so strong.)

  • If you’re arguing that model metrics don’t necessarily translate into useful output, I agree. That’s not how I measure the success of a mode and not really the point I'm trying to make. I try to set things up and test it on my actual projects.

    What I’m saying is that if local models were actually comparable to Claude Code in practice, we wouldn’t be having threads like this. It would be obvious to the people using them, and it would be massively disruptive. Why would individuals and companies pay hundreds or thousands for Claude Code if they could run something locally and consistently get similar results?

    Every month I revisit the local ecosystem hoping the answer has changed. So far, my experience has been that it hasn’t.

  • I think they are referring to the opportunity cost of time saved on doing things a local model cannot do or fixing it's mistakes against the cost of a subscription