Comment by dijit

5 days ago

Frontier AI companies are selling at a loss.

Excusing everything else that u/bastawhiz said[0]; the obvious fact here is that Claude, OpenAI, Gemini et al. are quite literally burning through 100's of billions of dollars and selling it back to you for pennies on the dollar in the hopes that they get to be the only one left.

If I spend $10 growing Oranges and sell them to you for $1; then of course it's more expensive for you to do the growing.

I feel like I'm taking crazy pills. These models will become more expensive over time, it's functionally impossible for them not to, they just want to capture the market before they have to stop selling at a huge loss.

[0]: https://news.ycombinator.com/item?id=48168433

70 comments

dijit

vanviegen 5 days ago

That seems unlikely. There are many providers for open models on openrouter. It seems unlikely that they are throwing money away for each token they sell.

Also, there a good technical reasons for inference being much more efficient at scale.

dijit 5 days ago
The providers on OpenRouter serving open models aren't "throwing money away", agreed.
But that's not the point I'm making. (or, it kind of is, but it's more high level than that).
They're running spot and preemptible GPU instances (60-80% cheaper than on-demand), paying wholesale industrial electricity rates, and running at multi-tenant utilisation densities that make your MacBook look like a bonfire. Of course they're not individually loss-making on inference, they're aggregating cheap commodity compute and skimming a margin, and on paper that's what makes it seem like a good idea, certainly not a loss leader right?
But zoom out a bit; the entire stack is swimming in VC money. OpenRouter itself just raised at a $1.3B valuation backed by a16z. The Chinese models that now account for 36% of all tokens routed through the platform (DeepSeek, Qwen) are priced the way they are because Beijing-adjacent capital has decided market share matters more than margin right now.
So yes, technically no single party is "throwing money away" on each token; they're just all simultaneously subsidising different parts of the stack for strategic reasons. The floor price you're seeing isn't a stable equilibrium, it's a pile of investor money that hasn't entirely finished burning yet.
- vlovich123 5 days ago
  
  > The floor price you're seeing isn't a stable equilibrium, it's a pile of investor money that hasn't entirely finished burning yet.
  All that says is that it gets more expensive in the future as competitors exit the market and sustainability becomes important. That’s why Uber and Lyft were so cheap until they killed taxis. One major difference of course is that some models will remain largely good enough and the incremental cost of running will keep dropping to 0 over time since the hardware needed doesn’t get more expensive and is already purchased.
  
  1 reply →
- reissbaker 5 days ago
  
  Most of these folks aren't running on spot/preemptible instances, they're doing 1-2 year reserved rentals. There isn't that much cheap compute floating around and you can be quite profitable on reservations (if you can get them — the compute shortage is real).
  I think the question in terms of throwing money away isn't the inference layer: it's whether the companies training open models will be able to financially keep doing so. How long will Moonshot keep releasing future Kimi models? I think there's an interesting wedge they're exploring with being basically a base-model-trainer-as-a-service, i.e. selling rights to Fireworks to sell finetuning services to the Cursors of the world, but it's entirely possible it doesn't pan out.
  That being said, Nvidia seems willing to step up to being the base model trainer of last resort via the Nemotron family of open models, since it helps sell more of their hardware — similar to their investments in the CUDA stack to sell hardware (unsurprisingly, Nemotron is designed to run most efficiently on Nvidia hardware, e.g. native NVFP4). So I suspect there will continue to be a pretty good market here.
lowbloodsugar 5 days ago
Sure. And there’s Lyft and Uber and plenty of others. And Grubhub and DoorDash and uber and how many others. And I don’t even fucking remember how many electric fucking scooter companies, I’m practically falling over scooters! I’m sure they aren’t earning market share by selling at a loss either.
- jolux 5 days ago
  
  Uber has been profitable since 2023.
  
  3 replies →
ares623 5 days ago

It is hard for the engineer to grasp the lengths VCs will go to to grab market and mind share.

NicuCalcea 5 days ago

The blog compares the cost of running Gemma4 31b, which on OpenRouter is offered by small no-name inference providers, not by frontier AI companies. It seems like a fair comparison to me.

pornel 5 days ago

LLM generation is bottlenecked by RAM bandwidth and latency. You can get almost linear scaling by evaluating more prompts in parallel, because the GPU has nothing to for the relative eternity it takes to read all of the weights from DRAM for every layer for every token.
On Apple Silicon you can get 4x-8x more tokens per second if you run more queries in parallel (as long as your inference server supports it, and has enough spare RAM for more KV caches).
When inference is done at datacenter scales, when you distribute generation across multiple GPUs and have kernels carefully tuned to specific hardware, the compute vs DRAM bandwidth speed ratio gets absurd like 200:1. That's why everyone gives you batch inference at a steep discount.
jeffybefffy519 5 days ago

Interestingly if you look at the cost of Gemma3 (this is 12 months old, but demonstrates the authors point) on Vertex AI versus Gemini 2.5 pro, the cost per million tokens WAS very similar.

brianwawok 5 days ago

So many more efficiencies possible at scale though. I cannot keep a local model 98% utilized 24/7, at least not with my current workload. A big cloud can. I can’t power my servers with DC, I have this AC to DV conversion nonsense. The list goes on.

visarga 5 days ago

Besides fill factor being hard to match, there is also scaling - you can't scale local inference 10x for a spike, but you can with cloud inference.

rprend 5 days ago

This is not true. API tokens are not sold at a loss, and hardware gets more efficient over time, so serving inference on the same model gets cheaper. LLAMA 3.1 405B parameters was $6/$12/M tokens in 2024, but in 2026 that same model is $3/$3/M tokens.

The most intelligent model at a given time is much larger than the previous, which is why token costs for GPT5.5 are higher than 5.4. But you should expect that 2 years from now, serving a GPT5.5 sized model will be cheaper than GPT5.5 today. You should expect it to be even cheaper to get an equally intelligent model 2 years from now, because distillation techniques are effective at reducing the necessary parameter count for the same benchmark scores.

eikenberry 5 days ago

So are they going to stop at GPT 5.5? This analysis only seems to be counting inference cost when the majority of the cost, and why they are burning through money, is the training.

poly2it 5 days ago

Well, I'd be surprised if non-R&D inference providers were selling at a loss. There are a plethora to choose from, competition is quite healthy. Will they keep providing cheap tokens while the labs raise their prices? Probably, but then I don't see how they could be raised in the first place. And what timescale are you talking about? A couple of years? It is appropriate to assume inference will become more efficient over time. If you raise your prices, you are going to be out competed before it's profitable (if you assume it is unprofitable) which would be negligent. I don't see how this makes sense.

ianberdin 5 days ago

Do you have a proof? Anthropic’s CEO said they Are profitable. Same with OpenAI.

dijit 5 days ago
Profitable for inference if you completely ignore training costs and that you absolutely must continuously train new models.
- vlovich123 5 days ago
  
  Which is where your analogy breaks down and why you think you’re taking crazy pills. Inference is growing and selling the oranges in your analogy. Model building is growing the farm to sell larger, juicier more addicting oranges.
  
  18 replies →
- spzb 5 days ago
  
  And ignore capital costs, depreciation, user churn etc
smallerize 1 day ago

Anthropic says they are losing money but hope to be profitable soon, if they can double their revenue. https://techcrunch.com/2026/05/20/anthropic-says-its-about-t...
OpenAI published their operating margins today. I can't find a non-paywalled source but Judd Legum reported it to be -122%.
Danox 5 days ago

AI CEOs are known to say many things telling the truth, probably isn’t one of them.
tiffanyh 5 days ago
Do you mind sharing source links to that profitability claim?
I’m struggling to find the quotes.
- no-name-here 5 days ago
  
  Open AI: https://simonwillison.net/2025/Aug/17/sam-altman/
  Anthropic: https://x.com/jaminball/status/2052112309364162874
  
  1 reply →
miltonlost 5 days ago

If only they had their books open to do more than just "say"
singular_atomic 5 days ago

do you have proof? Taking these guys at face value is not wise

OsrsNeedsf2P 5 days ago

The models have been dropping 10x in price for completing the same tasks, year over year. Even if you think Anthropic is losing money charging 10x more than everyone else for their 400B model, the prices will continue to go down based on model improvement alone

EGreg 5 days ago

These models will become more expensive over time, it's functionally impossible for them not to, they just want to capture the market before they have to stop selling at a huge loss.

They could have said the same about transistors. People keep inventing new ways to keep the costs down. Just look at the latest Qwen, DeepSeek, BitNet. Interesting tidbit: they’re all open, and as Google said in 2022: they have no moat.

tempest_ 5 days ago

It is the model training that is dragging them down.

If the arms race stopped tomorrow the current price pays for the inference.

Danox 5 days ago
But isn’t training models, a forever task like iterating in tech you can never take a day off, adding humans to the equation don’t humans train/teach themselves new skills over a lifetime, and isn’t one of the selling points in the future when selling this AI slop your AI never goes to sleep and can always be trained forever? The AI price for entry as we go on into the future will only increase.
- atq2119 5 days ago
  
  I agree that training is a forever task, and the current rate of training is probably not sustainable. But all that means is that once the current investment mania ends, the market will most likely find a new equilibrium where continuous training still happens, but at a slower rate that can be sustained by inference revenue.
  
  1 reply →
- asjir 5 days ago
  
  Just keeping it up to date with competitors is much cheaper, by copying better ones like Qwen did with Claude. Also a bunch of research is trickling into open source / arxiv so catching up should continue becoming cheaper at least as a fraction of training from scratch

MattRix 5 days ago

The inference is absolutely not sold at a loss, at least not when paying API prices (the subscriptions are less clear). The reason frontier model companies aren’t profitable is because training the models is so costly, not inference.

Groxx 5 days ago

https://old.reddit.com/r/GithubCopilot/comments/1tbb5bj/gith...

Seems to be on its way! I know of at least one person whose company is looking at a 20x increase, and afaict (from related looking around, nothing concrete tho) business accounts are missing some costs in the calculator so it'll likely be higher.

raincole 5 days ago

You should probably take some stay-on-topic pills, as this article is clearly and unambiguously talking about open weight models (e.g. gemma 4), not the ones allegedly being sold at a loss (Opus, ChatGPT, etc)[0].

[0]: these API are not sold at a loss either, by the way. But it's a nice meme so let's just pretend they are.

vlovich123 5 days ago

Except that’s not what the analysis is. They’re spending < $1 to get $1 from you and the other $9 to figure out how to improve the model further and build up products on top of that to turn that $1 spend into $5 in the future.

In other words, inference is fairly profitable for them and the rest of the money is spent growing revenue as quickly as possible. Building models is still an expensive line item but the costs for that are going down with time.

There is also maybe a “capture the market” mentality but I don’t think that’s necessarily it - the tools and processes are largely fungible and that’s a huge problem. They need to figure out how to make it sticky for “capture the market”, but there’s also a very real “grow as big as possible as quickly as possible to take on Google”; Google has an existential threat here.

MuffinFlavored 5 days ago

> Frontier AI companies are selling at a loss.

How big/deep of a loss?

I feel like I read this every day for years that Uber did this same "idiotic, losing" strategy (how it was pitched/discussed) and then one day we woke up and... without much fuss, boom, they were profitable seemingly overnight.

brianwawok 5 days ago
Well and uber cut the driver pay in half and doubled the price. They didn’t really find any efficiencies, robo drivers don’t exist yet. Also why I hardly touch them anymore.
- onesociety2022 5 days ago
  
  All that tells me is they did find an efficiency. If they didn’t, their driver supply would have dropped. Unlike the taxi business, Uber/Lyft can tap into otherwise dormant supply of drivers who already own a car but aren’t willing to spend all 40-60 hours a week driving a taxi. With Uber/Lyft, they can become part-time drivers (they have flexibility and they can use an asset they already own anyway). Is it worse for the full time taxi drivers who used to have the supply artificially constrained in the old medallion system? Yes, but does it also benefit others who want to do this as a flexible job, zero skills required other than driving, no boss to deal with, no job interviews, etc. Yes!
- MuffinFlavored 5 days ago
  
  > Well and uber cut the driver pay in half and doubled the price
  Devil's advocate:
  * inflation caused everything to go up to some degree since then
  * if it was "that bad" as you say, they wouldn't be extremely profitable and have so many users
  both things can be true? "they cut the driver pay in half and doubled the price" did not lead to the collapse of the business/people to stop using it.
spzb 5 days ago

Ed Zitron discusses this as part of his post on AI economics : https://www.wheresyoured.at/ais-economics-dont-make-sense/
Danox 5 days ago

As long as you have slaves/sharecroppers, driving the people at the top of the pyramid at Uber they’re profitable and Uber makes money as long as you don’t care about the workers and as long as you can get around all of the regulations that are put on traditional cab companies if there are any left on the road.
For me nothing says low class like the Porsche dealer saying we can call Uber for you to take you home ridiculous… and it was a low class experience dirty car small never again ha ha ha…

throwatdem12311 5 days ago

The Michael Scott AI Companies.

ajross 5 days ago

> I feel like I'm taking crazy pills.

Why? It's no less crazy than when Uber and Lyft were doing the same thing. Or when the entire tech industry was doing it in the dot com boom.

Investment-driven market growth at a loss is like the least surprising thing in all of this. The tech is new and fascinating. The bubble is just another trip through the funhouse.

visarga 5 days ago

> Frontier AI companies are selling at a loss.

There are huge economies to be had by batching requests and using lots of RAM for MoE (sparse models). You can't achieve that efficiency at batch size 1 on a single node.

asjir 5 days ago

Exactly, they put a lot of money into engineering and it does give results
eikenberry 5 days ago

Inference isn't the problem.