Comment by materielle

1 day ago

I'm about to leave a shallow comment, but I am a bit skeptical of the supposed drop in inference costs. If AI labs saw a lot of potential there, they'd surely be bragging about it non-stop? So the fact that publicly available information is conflicted is probably a sign that at the very least, the numbers aren't amazing.

Yes I know there's no evidence and this is lazy reasoning. But there's probably a bit of truth to this line of thought.

69 comments

materielle

Tuna-Fish 1 day ago

Why on earth would AI labs be bragging about how little the product they sell actually costs them to make? You don't want to do anything that reduces it's perceived value to the user, that might make them less willing to pay for it.

Also, inference costs are bound to go way down with more optimized architectures. GPUs are fundamentally not great at inference. No platform where the weights are streamed from a large pool of memory is. If the models ever quiet down, there will be massive step changes in cost/token, energy/token and tokens/second, as models are etched into silicon ala https://chatjimmy.ai/

overgard 21 hours ago
A couple of years ago Altman was saying the price of AI compute is going to drop 90% year over year or something like that, so I don't think they're nervous about talking about lowering their costs. They probably just haven't been able to lower their costs.
You have to keep in mind that about 99% of their announcements are targeted towards investors (their most important revenue source..), so they're not going to be afraid to mention metrics that make the business look better.
- bwhiting2356 13 hours ago
  
  Jevons paradox. Cheaper tokens does not mean we will spend less.
  
  2 replies →
- mcmcmc 20 hours ago
  
  Ah yes, Sam “Not Consistently Candid” Altman
  
  2 replies →
- whateveracct 14 hours ago
  
  he lied. he's good at that.
Yoric 11 hours ago

> Why on earth would AI labs be bragging about how little the product they sell actually costs them to make? You don't want to do anything that reduces it's perceived value to the user, that might make them less willing to pay for it.
Wouldn't they be bragging about it to investors? It feels like something that would matter a lot to them, and at least OpenAI kinda feels desperate to find them.
There's also the small question about whether a drop in inference cost would actually change anything about profitability, when training seems to get exponentially more expensive.
golem14 1 day ago
Why would any company brag about their margins ? Yet they do, to attract investors.
- Tuna-Fish 1 day ago
  
  The key AI labs are not public companies, they are at liberty to brag about their margins to potential investors in private.
  
  25 replies →
- lmm 17 hours ago
  
  Growing companies don't brag about their margins, they brag about their growth and revenue. Margin talk is for when you're a mature company squeezing out every bit of profitability you can - if anything it would be a negative sign to be worrying about your margins when you're supposed to still be growing and innovating.
  
  1 reply →
- amarant 12 hours ago
  
  I mean, did anyone expect them to not have margins? Why keep it secret?
neltnerb 21 hours ago

Because companies that want to go public need to look profitable or potentially profitable. And before they go public they have to release real, actual, legally demonstrable numbers for their costs and revenue anyway.
etempleton 18 hours ago

Because the most important thing for any pure play AI company right now is to prove they are a viable company. And sure they have proved they can make billions, but also that they can lose billions more. They are going to need even more money and to prove to the next round of investors at an even higher valuation that they are a viable business they need to show not that they can generate revenue, but that they can one day turn a healthy profit. And that is the trillion dollar question.
jimbokun 21 hours ago

I doubt having to replace every single chip in your data center every time you release a new model will bring down costs.
kopirgan 15 hours ago
Went to that URL asked one question - "how is this different from other AI" and it took 598/6144 tokens, not sure what that means.
- philipswood 14 hours ago
  
  Not super clear from the site itself, but this LLM is running on specialized silicon implementing just it. So has super low energy use and blazing speed.
  See https://taalas.com/products/
  Edit: updated link
  
  1 reply →
DrewADesign 18 hours ago
Because they can think more than one quarter into the future? Why on earth would someone adopt something into their core workflow that was fantastically unprofitable? Uncertainty and business don’t mix. Most people aren’t hype-eating bacteria that only care about maximizing their next paycheck.
- nradov 2 hours ago
  
  Regardless of profitability there will always be multiple good LLM vendors as well as open-source alternatives (slightly worse but still pretty good). If one vendor fails then it's easy to switch your core workflow to a competitor.
- wheresmylogin 6 hours ago
  
  One reason is that all the code you write with this goes in your private git. If using AI no longer is possible because of cost, you can still profit a lot from what you did with it before.
  
  1 reply →
kopirgan 15 hours ago

If inference costs drop 90% or whatever, that would be a massive write-off of hardware even before they gave any returns for it?! Given Chinese and others are snapping at the heels and would also benefit from such reduction in cost.
solarkraft 14 hours ago

> Why on earth would AI labs be bragging about how little the product they sell actually costs them to make?
Investor confidence. They have a bit of a need for cash (also an interesting part of the profitability discussion of course).
> Also, inference costs are bound to go way down with more optimized architectures
I agree. Jimmy is incredible, I wonder what non-toy use cases they have. Surely they’ll come out with updated chips soon.
That said, I was apparently a bit over-excited for Groq and Cerebras. I thought they’d quickly dethrone Nvidia for inference, but not so far. Even the GPT spark trial isn’t seeming to go far.

whatshisface 1 day ago

Inference has traditionally been far less expensive than training. One public example is the fact that hobbyists can run StableDiffusion ($600k training costs[1]) on their personal computers.

Speaking to your point, inference being dramatically less costly than training would not be seen as a delta from the norm. The model of providing inference for anything near the operational costs (like a utility would), would the delta from the norm if it were true.

[1] https://x.com/emostaque/status/1563870674111832066

thesz 1 day ago
The difference between training and inference is 1) one have to keep intermediate results for backward pass in training and 2) computation for training double because of the backward pass.
Training is also done over batches, which increase memory requirements by several orders of magnitude. This is why training needs costly compute.
One of the ways out of this unfortunate situation is to use something like Stochastic Average Gradient Descent [1]. Examples there are mostly concerned with regularized logistic regression, which makes problem more or less convex. Neural networks are inherently non-convex. Still, maybe some ideas from there can be utilized in the context of neural networks, like use of estimated Lipshitz constant to derive curvature and appropriate learning step.
[1] https://www.cs.ubc.ca/~schmidtm/Courses/540-W19/L12.pdf
- janalsncm 21 hours ago
  
  So one way to think about it is roughly,
  Training is inference + backwards pass (~2x inference cost) + activations (vram overhead) + optimizer (vram overhead) + gradients (vram overhead).
  
  4 replies →
- mike_hearn 7 hours ago
  
  It's all got much more complex than that in recent years. Training now involves large amounts of inference for RL rollouts and similar. You can't disentangle them computationally like that. "Inference" is just the word used to mean serving customer traffic now, and "training" means creating the model you serve.
- whatshisface 19 hours ago
  
  That is an estimate of the relative cost of one training step, but you have to multiply it by the number of training steps, an unknown quantity.
vanviegen 7 hours ago

I think in your StableDiffusion example, a lot more than $600k will have been spend on electricity alone for inference (on those personal computers you mention). So inference is more expensive then training.

lumost 21 hours ago

For equal capability tokens, there has been about a 10x drop in cost every 6 months.

We are still chasing the best because the best is moving rapidly, but it’s a simple thought experiment to work out what the cost to serve an 8B model from 2 years ago is in a world of 2T models.

Note: parameter counts are illustrative. Concretely, qwen3.6 27B delivers opus 4.5 capability at 1/27th the cost on openrouter. Single chip llama3 8b performance can exceed 17k tokens/sec.

byzantinegene 12 hours ago
8B models would be consider obsolete in the world of 2T models, at least if we're talking about the competitiveness of OpenAI/Anthropic. The only reason why they are valued so highly is their supposed dominance at the top end.
- lumost 5 hours ago
  
  The main story of agent use cases is in enterprise so far. An enterprise will only pay for a model capable of handling the task and no more. Most enterprise's see no need to hire PhDs as factory line workers.
  Coding is an interesting case as [1] the pace of progress has been absurd and [2] it's hard to put an upper bound on required capability. However hard to put a bound on and will are different, it's quite possible that the average engineer will cease to see the benefit of rapid progress - or that their employer will be satisfied with lower tier models.
  How smart of a model do you need to build a high quality CRUD app for internal users? Or build a scalable web service?
joshuahedlund 7 hours ago
> For equal capability tokens, there has been about a 10x drop in cost every 6 months
Is this still happening? Opus 4.5 was six months ago, can you get its capabilities for 1/10 cost now? Are we on track to get the same for 4.6 in a couple months?
- lumost 5 hours ago
  
  Pretty much, Kimi K2.6 is opus 4.6 quality for coding. If you include discounts due to more efficient input caching it is around 1/10th of opus4.6.
  https://openrouter.ai/moonshotai/kimi-k2.6
  The march of cost efficiency moves on.

no-name-here 15 hours ago

> I am a bit skeptical of the supposed drop in inference costs. If AI labs saw a lot of potential there, they'd surely be bragging about it non-stop?

Unless to the grandparent commenter’s point they’re using it to obscure their large prisoner’s dilemma (training) cost?

neuronexmachina 17 hours ago

> If AI labs saw a lot of potential there, they'd surely be bragging about it non-stop?

Google seems to pretty regularly post about how their TPU and algorithm advancements have been decreasing energy costs for both inference and training.