Comment by onlyrealcuzzo

5 days ago

The actual cost is going to drop 99% in ~4 years.

How much that makes it into enterprise pricing is TBD, since none of the hyper scalers are making money yet of selling AI inference.

Almost all businesses are ahead of the gun. For most of their use cases, AI is either not yet good enough on its own, or good enough but too expensive.

No one wants to get left behind, so everyone's trying to get onto it now, even though it's not ready for what most enterprises want to do with it.

It's easy for them to look at a small startup without billions of lines of legacy business logic debt and see them having success and wonder why they can't have just as much - or more - why they're bigger so they should have better and more success, right???

Wrong...

But when it gets ~99% cheaper for local inference over the next 4 years, at the same time the price per watt improve 4x -> a lot of those cases will start to pencil out.

40 comments

onlyrealcuzzo

BearOso 5 days ago

Going from Opus 4.5 to 4.7 secretly required 6x more compute to run. 4.8 is apparently 30% more on top. I haven't seen any optimizations lately aside from distillation. Nobody's optimizing, they're just scaling up.

rescbr 5 days ago
> Nobody's optimizing
The Chinese, since they lack computing hardware due to US export controls, are.
- trollbridge 5 days ago
  
  And our export controls are going to turn China into a winner in the AI arms race if we're not careful.
  
  4 replies →
trollbridge 5 days ago
DeepSeek and Alibaba would like to have a word.
- whatthesmack 4 days ago
  
  Hasn't everything DeepSeek and Alibaba created thus far been distilled from the results of many, many accounts logging into Claude and ChatGPT? And that's why there's so much bot detection now at US frontier labs? Doesn't that make the Chinese labs dependent until some unknown point in the future on advancements of US frontier labs? While what they currently provide is cheap, it seems like it's artificially cheap and somewhat static because they took others' intellectual property (no comment needed about US frontier labs stealing the world's knowledge... that's a separate topic).
  
  1 reply →
new_account_102 5 days ago

[dead]

krona 5 days ago

> The actual cost is going to drop 99%

Do you mean the marginal cost by the producer, or the cost on the consumer? I can't see the price of electricity falling much, and the demand curve is apparently exponential if the hype is to be believed.

trollbridge 5 days ago
DeepSeep V4 Pro is 99% cheaper than similarly performing models were 2 years ago (if such a model even existed).
Computing has always been about how to wring out more efficiency. The ENIAC was 150,000 watts, with 3 phase 240 volt power, and cost about $500,000.
My day to day laptop (a year old) is 35 watts, with 1 phase 20 volt power, and cost $1,000, so that's 99.98% less power consumption, 99.8% cheaper, and it has about 10 orders of magnitude more computing power, all on a time span of 80 years.
- cratermoon 5 days ago
  
  Moore’s law is dead.
  
  2 replies →

packetlost 5 days ago

I don't see how this is even remotely true. Unless there's some super breakthrough into a fundamentally different architecture, there's not really a path to a 50% reduction in price, much less a 99% reduction.

kilroy123 5 days ago

In fairness, I think _current_ capabilities will be cheaper. So the models of today will be run drastically cheaper in 4 years.
onlyrealcuzzo 5 days ago

And yet 90% drops for the same level of quality every 18 months have happened like clockwork...
And the technology already exists on the algorithmic front TODAY to lock in another 10x gain -> when, typically, algorithmic gains only account for ~30% of that drop and the other ~70% comes from better data (often synthetic) and knowledge distilation from frontier models.
Just look at DeepSeek's pricing...

datakan 5 days ago

What makes you think prices will drop? Everyone I’ve spoken to believes they will only skyrocket. Genuinely curious

onlyrealcuzzo 5 days ago
The technology already exists now on the algorithmic front for the next 10x drop between everyone adopting DeepSeek's MLA, MoE (mostly already done), Medusa (a better version of Google's speculative decoding), Kimi's Attn Residuals, and Mimo's Sliding Window Attn, and (possibly) Microsoft's 1.58b (this may be a nothing burger).
Historic trends, every 18 months, performance for the same level of quality has gone down 90%.
See: https://www.reddit.com/r/LocalLLaMA/comments/1gpr2p4/llms_co...
And Chart 13 here: https://www.rdworldonline.com/ais-great-compression-20-chart...
And here: https://epoch.ai/data-insights/llm-inference-price-trends
Historically, algorithmic gains are only ~30% of the pie, but there's enough out there to get to 10x, with just what's available already. The other ~70% of the pie is better training data (often synthetic) and distilling frontier knowledge. There's no sign we are tapped out on that front.
Additionally, GRAM (from ~10 days ago) is likely to be a 5-10x on its own (if not substantially more for smaller models). It's unlikely within 4 years LeCun's JEPA ideas and similar ideas like GRAM applied to LLMs have ZERO impact. The preliminary results are absolutely astounding (5000x better reasoning - this is not peanuts).
Further, that's not even counting that cost per watt is still dropping ~2x every 2 years on its own on the hardware front.
If you look at the "cost" of inference. People think it's electricity - but it's currently almost ~80% hardware amortization. The memory shortage is not going to last, nor are Nvidia's ~80-90% margins.
The human brain is still 8-10 orders of magnitude more efficient than the best LLMs of today. With ~1/10th of global capex riding on AI, if you don't think they're going to knock of 2 orders of magnitude more, when it's this obvious and easy... I don't know what to tell you...
Sure, it might take 6 years instead of 4. My crystal ball isn't perfect.
- HarHarVeryFunny 5 days ago
  
  Sure, the price will come down a lot, even if we can argue about the timeline.
  I think what will also happen, once we get past this current CEO AI FOMO mania, is that companies will start to look at AI spending more rationally like any other company expense, and will revert to more rational decision making.
  Even if the cost comes down considerably over the next few years, that's plenty of time for companies to look at their financial results and question why AI expenditure isn't resulting in increase in revenue and/or profitability.
- datakan 5 days ago
  
  This is great food for thought, thank you
  
  1 reply →
- Nimitz14 5 days ago
  
  This is mostly slop. But you may be directionally correct
- rednb 5 days ago
  
  I didn't take you seriously initially but after reading this, i think you are the real deal.
  Thank you for sharing this and for having the intellectual courage to hold to a sound reasoning that may be unpopular initially.

bakugo 5 days ago

Prices have been very obviously trending up, not down. Even open weights models are becoming more expensive with every release. Computer hardware is ballooning in price.

onlyrealcuzzo 5 days ago
Prices are going up for BETTER quality -> not for the SAME level of quality.
People are willing to pay more for BETTER quality.
You obviously haven't seen DeepSeek v4 Pro's pricing if you think pricing only goes up...
- bakugo 5 days ago
  
  Maybe so, but that becomes irrelevant when you consider that the new, better quality instantly becomes the expected baseline. So the price of the "baseline" quality is going up regardless.
  Let's look at GPU prices as an example. Around 12 years ago, I bought a GTX 970 for around $350. That was considered a very good GPU at the time. Today, the "equivalent" GPU model (RTX 5070) now costs almost double. Of course, the newer GPU is much more powerful (more than double, in fact), but all the things you'd use a GPU for have also advanced and now expect an entirely new level of performance as a baseline, such that the older GPU is fairly worthless today. So most people agree that GPUs in general have become more expensive.
  Regarding DeepSeek's price: it's obviously subsidized, and unlikely to match the actual inference cost right now.
abalashov 5 days ago
Just wait for the next model and the next model architecture. Just wait for it, bro.
- onlyrealcuzzo 5 days ago
  
  Gemini 3.5 flash is 25% cheaper than 3.1 pro, and outperforms it on almost every benchmark, most by a pretty wide margin...
  
  3 replies →
trollbridge 5 days ago
Grab a 5090 and run Qwen 3.6 35b on it (6 parameter seems to work best for me).
Then buy $10 (or $2, if you're cheap, and they take PayPal) of DeepSeek credits.
Whilst you're at it spring for a Claude subscription too and GPT.
Switch models between Qwen, DeepSeek Flash, DeepSeek Pro, and you can meet 99% of your code generation needs.
Hop over to Opus 4.7 (or 4.8, but I haven't really used it yet) and GPT-5.5 when doing very complex architecture/design or troubleshooting something where DeepSeek Pro is getting stuck.
It is ridiculous how cheap this stuff is now. It's affordable at third world prices.
- Supermancho 5 days ago
  
  None of that is cheap.
  > spring for a Claude subscription too and GPT.
  You started with some random pricing then veered off into impractical hand waving. Far above third world prices...unless you count the USA as third world, I guess.
  
  1 reply →

AllegedAlec 5 days ago

> The actual cost is going to drop 99% in ~4 years.

And fusion power is just 2 decades into the future!

jjav 5 days ago

Full self driving guaranteed here before the end of the year (every year).

mrandish 5 days ago

> The actual cost is going to drop 99% in ~4 years.

We have little visibility into current frontier model costs at mass scale. As a broad historical trend, tech costs tend to fall over longer time periods but your claim far exceeds Moore's Law rates in its heyday - and that heyday is long gone.

In 2021 TSMC announced it was increasing it's price per gate for new nodes for the first time in its history. In the past five years cutting edge nodes have delivered ~8-15% real-world performance gains on average at costs at least 10-20% more than the last node. If you're positing a string of unprecedented efficiency breakthroughs in LLM algorithms - such extraordinary claims require extraordinary evidence.