Comment by simplyluke
21 hours ago
Yeah, that's the part that just seems to be wildly under-discussed to me.
If open source models are ~3-6 months behind SOTA, and ~opus4.6 capabilities are good-enough for product market fit, do the frontier labs have half a decade to catch up on their prior burn?
AI cost ballooning faster than companies can afford is becoming a very common topic in my circles right now. The era of "I'll pay infinitely more for marginal gains" is over from what I can tell.
> If open source models are ~3-6 months behind SOTA, and ~opus4.6 capabilities are good-enough for product market fit, do the frontier labs have half a decade to catch up on their prior burn?
They know they do not and that’s why they’re all trying to IPO right now, so they can pass the bag to consumer investors
More correlation, if more correlation was needed:
1- SpaceX + Tesla + xAI merger / IPO while Musk was vocal against IPO for about a decade
2- Warren Buffett cash at record highs
Someone got to be exit liquidity
Open source models that you can run locally are much more than 3 to 6 months behind. 6 months was the November inflection for Claude. No open source model is as good as Claude Opus 4.6.
It depends what you mean by locally. I don't foresee running a model on my laptop anytime soon to power a coding agent. Far more likely is an infra team at my company operating an open source model on cloud infrastructure. When they're already paying $1000 / month / dev, it starts to pencil pretty quickly.
Is there any open model as good as opus 4.6 at any price?
13 replies →
> that you can run locally
That's doing a lot of work here.
The future I see isn't most companies buying hundreds of thousands in hardware to run models, it's them adding a line item to their AWS bill. Inference costs on the larger hosted open source models are dramatically lower than the frontier labs API pricing.
The future I'm seeing is AI coprocessors running inference locally in most devices that today have a CPU. Just look at how powerful your mobile phone has become compared to your desktop computer 15 years ago and compared to a main frame 30 years ago.
The days of requiring a data center to run anything resembling opus 4.6 are already counted. (But the industry will fight hard to get people to keep paying the Claude tax.)
22 replies →
> it's them adding a line item to their AWS bill
That's the future Amazon sees too. We just had a week long session with the AWS team and they pushed that to us multiple times.
Buying "hundreds of thousands in hardware" sounds like a lot but many companies - especially software companies - already do that if they have 100+ employees.
Running software in the cloud gives you certain reliability and scaling advantages that would be very hard to replicate locally. Running some code agents in the cloud vs local hardware, if the local hardware gets "good enough," breaks the other way - offline usage, alone, would be hugely valuable to many people and companies.
It'd be very interesting to see where various players would decide to make a call "local is good enough" though. Buying the hardware isn't a small bet, if it's not something that ends up as part of your standard computer.
Many business tasks do not need the latest frontier models. I have a production system running since early GPT-4o. It now runs with GPT-5.2, not for improvements, but because it is cheaper. I could invest in switching to a local model, I tried and it works well enough, but api costs for this task are so low, it barely scratches $30/month. So I am using the local machine for other things and leave the inference on OpenAI, for now.
I've been doing my work with OpenCode Go, with Kimi2.6. It is not as good as Claude Opus, but it's good enough to get the job done, and I never run out of tokens.
This project argues that with appropriate harness, the performance gap between frontier and much smaller open weight models shrinks dramatically: https://github.com/antoinezambelli/forge. I haven't kicked the tires yet.
Opus 4.6 is a February model. Every time this subject comes up it seems like people post intentionally misleading things and move the goalposts.
The goalpost we've been bludgeoned with over and over again is that, in particular, Everything Changed in November 2025. That GPT 5.2 and Claude 4.5 were the inflection point. That is actually 6 months ago. And DeepSeek 4 is already there.
> run locally
You can't run DeepSeek locally on consumer hardware[1], but you can on enterprise hardware, and enterprise spend is the subject of this conversation -- and even if you aren't self-hosting, it doesn't matter, because you can just get your inference from one of the the many companies serving DeepSeek, who trivially undercut the pricing of OpenAI/Anthropic because they didn't have to spend hundreds of billions on training frontier from scratch but instead only invest in supporting inference, which is already profitable.
[1] Since this misconception comes up all the time, I'll go ahead and pre-empt it: no, training a 32b parameter model on outputs from DeepSeek and running that locally is not "running DeepSeek", despite the hundreds of stupid articles and Youtube videos making that idiotic claim that they're running it on a 5090.
> You can't run DeepSeek locally on consumer hardware
Maybe not DeepSeek v4 Pro, but I've run DeepSeek v4 Flash on my 128GB MacBook Pro using antirez's carefully quantized https://github.com/antirez/ds4 and it's impressive.
1 reply →
> You can't run DeepSeek locally on consumer hardware
I'd qualify that by writing that you can't run it with ordinary, real-time speed and throughput. If all you care about is slow and high-latency inference, there's no reason why that shouldn't be feasible even on the cheapest miniPC around, as long as it can literally store the model weights and keep around the (rather small) context.
I keep hearing about this "inflection", but it feels extremely exaggerated to me. And yes, I was using it at the time. It got incrementally better, it wasn't that amazing.
I think the bigger shift was harnesses and the two ended up somewhat commingled in people's minds.
Claude code was a lot of people's introduction to using coding agents that could do a lot more than copy-pasting from a chatbot or autocomplete.
The tool usage + skills got markedly better and so did the thinking cohesion. Add 1m context windows and it was a very noticeable shift.
Opus 4.6 quality for local inference would be revolutionary.
1 reply →
But one will be in few months. And then you have choice of paying say $100k for hardware and pay just power cost (or pay someone to do that for you), or pay way, way more for your team to have access to marginal improvement.
And 5% worse model for 10% of the price of the bleeding edge will be worth it for majority of people
To be relevant to this discussion, models running on reasonably-priced local hardware do not have to be as good as the best.
They just have to be useful enough that companies don't need the best.
They are.
Kimi is better.
[dead]
[dead]
There's still a lot of room for the best models to get better at coding .
Your argument rests on the "for marginal gains" part but it's really not clear that the gains are marginal in the foreseeable future.
This is totally valid and I don't agree with the downvotes you're getting. Someone coming out with a 10x improvement is possible and would change the game immediately. The thing is, we really have been seeing marginal gains with shifting leaders in who's got the "best" since GPT3, and at least as a user of these tools that pace has been slowing, not accelerating. Subjectively it feels like we're in the back half of an S-curve.
We're 3.5 years into this current AI wave, and a lot of the valuations have been predicated on what you're arguing here -- that essentially should one of the labs make an order-of-magnitude improvement or hit escape velocity on recursive self-improvement they'd become the most powerful economic chokepoint in history.
The reality has been that given access to compute + capital all of the labs can stay pretty competitive with each other. Someone does a bit better on coding, someone else does a bit better on tool calling, and then they swap after each spending another $100bn.
The market looks like a commodity market where the commodity is intelligence, not a winner-take-all market with massive margins. Plenty of people get rich in oil and airlines, but they notably don't tend to be the innovators long term, they tend to be the operators. Obviously if the machines become sentient tomorrow, turn on their masters, and hit world-dominating intelligence, that assessment changes, but after several years of that narrative while objective reality looks quite different I think the more sober voices are starting to gain a foothold.
I agree with most of what you're saying, but I think the point I was trying to make wasn't as high-flying as you and others understood it.
I'd pay a premium for even just a model that's 20% better, no ASI required, and I think a lot of people would. I wouldn't call that marginal, if it means I'm getting frustrated on 20% fewer tasks.
A recurring pattern that I've seen in myself and others is to at first be very impressed by a new model's coding capabilities, and then desensitize quickly and start being frustrated by the shortcomings.
1 reply →
What? The gains between gpt4->5 seems to be marginal. No phd level discoveries here
The leap from GPT-4 to GPT-5.5 has been astounding in my opinion. There is no way GPT-4 could run a coding agent harness like Codex at even a fraction of the quality that GPT-5.5 does.
2 replies →
Open source models, especially qwen are pretty dang good. But its not opus 4.6, the evals dont tell the full story. I question the assumption open source models are 3-6 months out.
Its not just about the quality of output, but you also can finetune them to proprietary needs, if the skillsets are their internally, to make them better without governance risks. So being SOTA doesn't matter as much, since generalized tasks are not what matter most to companies, its the specialization relative to business need or internal datasets.
To make an extreme comparison, desktop Linux was originally supposed to happen in 1999.
Maybe I misspoke by saying open source.
The larger point I'm making is I think models are rapidly becoming commoditized. There is probably a small market long term that's willing to pay 10x for 10% marginal gains, but the majority of the buyers in the market will be economic and we're likely to have a lot of folks willing to spend 1/10 the cost for 90% of the performance, and plenty of companies that haven't raised hundreds of billions-trillions who can provide that.
A lot of the frontier labs valuations has been based on an assumption that 1-2 companies would get break-away intelligence that basically made them economic chokepoints indefinitely into the future. The reality that's becoming increasingly clear is that model quality is a pretty linear function of (cash burned - ability to copy other's homework) and the economics are starting to look a lot more like airlines than online advertising.
1 reply →
You have to think about why open models are behind. Exfiltration is a big part of it. So you could change the Nash equilibrium by increasing your security, or other multilateral approaches.
If only the AI era was born in ZIRP.
Better now than ZIRP for me - at least people are asking timid questions about the unit economics and how long the runway is _early_ while also spending absolutely insane amounts of money on this bet. During ZIRP, these companies would have turned down any investor asking questions. Less contagion when rates aren't zero hopefully? :grimace:
The size of the AI bubble and the IOUs being passed around like a hot potato already dwarfs the real estate bubble preceding the 2007 crash.
If we still were in the ZIRP era, busting the bubble would certainly kill off the world's economy for good simply due to its size.