← Back to context

Comment by doug_durham

1 day ago

Open source models that you can run locally are much more than 3 to 6 months behind. 6 months was the November inflection for Claude. No open source model is as good as Claude Opus 4.6.

It depends what you mean by locally. I don't foresee running a model on my laptop anytime soon to power a coding agent. Far more likely is an infra team at my company operating an open source model on cloud infrastructure. When they're already paying $1000 / month / dev, it starts to pencil pretty quickly.

  • Is there any open model as good as opus 4.6 at any price?

    • How many problems require Opus-4.6-level performance? The "I'll accept nothing but the very best model for any task" thinking is perplexing to me.

      People got a lot done before Opus 4.6. In 6 months, would you be dissatisfied by Opus-4.6-level open-weight models, just because Opus 4.8 will be out?

      8 replies →

    • Kimi 2.6 probably. Needs over 300GB of GPU memory to run (1TB for for full capabilities) so either a 4x A100 or 8x A6000 would do it.

      A $50k - 100k rig could do it and an entire company would be able to use it a full speed.

    • No, but the big open models are on the level of Sonnet 4.6, which is very good for most problems.

      The people who are claiming Opus level capability does not have sufficiently complex problems to see the difference.

      1 reply →

    • For coding don't think so, but they are very close. I code with sonnet mostly because I think opus is just useful if you fail to dissect problems adequately, but anyway.

      Kimi is close for example regarding SWE bench for code. For reasoning there are open models that surpass opus by quite a margin already.

> that you can run locally

That's doing a lot of work here.

The future I see isn't most companies buying hundreds of thousands in hardware to run models, it's them adding a line item to their AWS bill. Inference costs on the larger hosted open source models are dramatically lower than the frontier labs API pricing.

  • The future I'm seeing is AI coprocessors running inference locally in most devices that today have a CPU. Just look at how powerful your mobile phone has become compared to your desktop computer 15 years ago and compared to a main frame 30 years ago.

    The days of requiring a data center to run anything resembling opus 4.6 are already counted. (But the industry will fight hard to get people to keep paying the Claude tax.)

    • I'm already running a google TPU over USB on an otherwise very cheap board to do local computer vision on a front-door camera since I wanted to get away from Ring and other cloud services for that use case.

      And yeah, that may be the ~decade world, but we're in the mainframe era of the frontier models. It's going to be more economical for basically any consumer, and most businesses, to pay someone else to host models for quite a while.

      15 replies →

    • Even when run on datacenters, it would be like current day webhosting. It is hyper competitive and it will be a race to the bottom. There is money to be made but not as much as investors hope. There will be datacenters in random countries like Kazakhstan because some oligarchs have found a free energy glitch (like with bitcoin mining).

    • > But the industry will fight hard to get people to keep paying the Claude tax.

      I bet this will ironically be couched in "safety" reasons or regulation to get anti-AI folks on board, even if it favors the large incumbents.

    • Magical thinking. I guess if your phone is going to have 128gb of dddr5 then sure. You people fundamentally don't understand the memory requirements for running inference. Your cute local models seem good enough because you have no standards and anything an LLM produces seems like magic to you.

      2 replies →

  • > it's them adding a line item to their AWS bill

    That's the future Amazon sees too. We just had a week long session with the AWS team and they pushed that to us multiple times.

  • Buying "hundreds of thousands in hardware" sounds like a lot but many companies - especially software companies - already do that if they have 100+ employees.

    Running software in the cloud gives you certain reliability and scaling advantages that would be very hard to replicate locally. Running some code agents in the cloud vs local hardware, if the local hardware gets "good enough," breaks the other way - offline usage, alone, would be hugely valuable to many people and companies.

    It'd be very interesting to see where various players would decide to make a call "local is good enough" though. Buying the hardware isn't a small bet, if it's not something that ends up as part of your standard computer.

Many business tasks do not need the latest frontier models. I have a production system running since early GPT-4o. It now runs with GPT-5.2, not for improvements, but because it is cheaper. I could invest in switching to a local model, I tried and it works well enough, but api costs for this task are so low, it barely scratches $30/month. So I am using the local machine for other things and leave the inference on OpenAI, for now.

I've been doing my work with OpenCode Go, with Kimi2.6. It is not as good as Claude Opus, but it's good enough to get the job done, and I never run out of tokens.

I keep hearing about this "inflection", but it feels extremely exaggerated to me. And yes, I was using it at the time. It got incrementally better, it wasn't that amazing.

  • I think the bigger shift was harnesses and the two ended up somewhat commingled in people's minds.

    Claude code was a lot of people's introduction to using coding agents that could do a lot more than copy-pasting from a chatbot or autocomplete.

  • The tool usage + skills got markedly better and so did the thinking cohesion. Add 1m context windows and it was a very noticeable shift.

    Opus 4.6 quality for local inference would be revolutionary.

Opus 4.6 is a February model. Every time this subject comes up it seems like people post intentionally misleading things and move the goalposts.

The goalpost we've been bludgeoned with over and over again is that, in particular, Everything Changed in November 2025. That GPT 5.2 and Claude 4.5 were the inflection point. That is actually 6 months ago. And DeepSeek 4 is already there.

> run locally

You can't run DeepSeek locally on consumer hardware[1], but you can on enterprise hardware, and enterprise spend is the subject of this conversation -- and even if you aren't self-hosting, it doesn't matter, because you can just get your inference from one of the the many companies serving DeepSeek, who trivially undercut the pricing of OpenAI/Anthropic because they didn't have to spend hundreds of billions on training frontier from scratch but instead only invest in supporting inference, which is already profitable.

[1] Since this misconception comes up all the time, I'll go ahead and pre-empt it: no, training a 32b parameter model on outputs from DeepSeek and running that locally is not "running DeepSeek", despite the hundreds of stupid articles and Youtube videos making that idiotic claim that they're running it on a 5090.

  • > You can't run DeepSeek locally on consumer hardware

    Maybe not DeepSeek v4 Pro, but I've run DeepSeek v4 Flash on my 128GB MacBook Pro using antirez's carefully quantized https://github.com/antirez/ds4 and it's impressive.

    • Oh sure, yeah, that's nothing to sneeze at either. I think unqualified "DeepSeek" should generally refer to the main model, though, especially in the context of GPT5.2-grade quality.

  • > You can't run DeepSeek locally on consumer hardware

    I'd qualify that by writing that you can't run it with ordinary, real-time speed and throughput. If all you care about is slow and high-latency inference, there's no reason why that shouldn't be feasible even on the cheapest miniPC around, as long as it can literally store the model weights and keep around the (rather small) context.

But one will be in few months. And then you have choice of paying say $100k for hardware and pay just power cost (or pay someone to do that for you), or pay way, way more for your team to have access to marginal improvement.

And 5% worse model for 10% of the price of the bleeding edge will be worth it for majority of people

To be relevant to this discussion, models running on reasonably-priced local hardware do not have to be as good as the best.

They just have to be useful enough that companies don't need the best.

They are.