Comment by Tuna-Fish

1 day ago

It's not a big secret. If you just do the math yourself, it's easy to compute that inference doesn't cost all that much. People just see all the capital investment going around and all the new data centers being built, see that it's spent on "AI", put two and two together and get a three, or "clearly serving AI requests costs an arm and a leg".

The 1 they were missing is that AI requires both training and inference, and training is by far the expensive part. And that in principle you can stop training at any point and keep using the models as they are. (But that means that if other companies keep improving their models, you'll be left behind...)

In contrast, inference is fairly cheap and all the providers have great margins on it. Eventually either investment in training stops having commensurate impact on model quality, and people stop doing that and instead concentrate on making inference faster and even more efficient. Or if that doesn't happen, things will get very weird very quickly.

8 comments

Tuna-Fish

whatever1 19 hours ago

The market already shows where it will go.

If you want frontier model you will pay more for inference to essentially fund the expensive training.

If you don’t need frontier model you will get dirt cheap inference, which eventually will approach the cost of electricity spent per token.

mattmanser 6 hours ago

This is technically correct, but practically false.

They can't stop training as then the AI's knowledge will become out-of-date very quickly. Their knowledge stops the day you stop training.

flextheruler 3 hours ago

Yes it seems that this discussion that has sparked such controversy involves an already well defined concept in business.
Net margin versus gross margin.
Net shows profitability after extracting all expenses while gross only extracts the cost of the goods sold. Putting the model training costs into a one time fixed expense provides a much better gross margin.
This is known as COGS reclassification or classification shifting and is a common tactic to mislead investors.
This is why analysts look at Free Cash Flow Margin.
WorldCom and MicroStrategy did this before the Dotcom Bubble imploded.

ethin 21 hours ago

> If you just do the math yourself, it's easy to compute that inference doesn't cost all that much.

Show us your work, then. If it's so easy to do, this should be a trivial request to accommodate, no?

mediaman 20 hours ago
Just look at large open weights models being served by inference providers.
Kimi 2.6 is a 1 trillion total / 32B active parameter model that's something comparable to Sonnet. Sonnet's API pricing is $5 in, $15 out per million tokens. Deepinfra serves Kimi at $0.75 in, $3.50 out, and about the same at openrouter. So you're looking at a 4-7x multiple that Anthropic is charging compared to market rates that any plebe can get with a credit card.
- majormajor 19 hours ago
  
  I'm not sure just how good that looks for Anthropic/OpenAI.
  4-7x isn't a tiny markup, but how does that compare to high-margin internet businesses like AdSense? Meta and Google do hundreds of billions in ad revenue a year, and after taking out the publisher's portion (60-80% per some searching), I wonder what the ratio of the remaining tens-of-billions is against the compute cost and headcount required to run it.
  And how much room for maintaining or improving that margin do they have if the cheap competitors also continue getting better? Is there a "good enough" point where the easier inference tasks are all moving to vendors massively undercutting them, and then they don't have the volume necessary to justify spending on further cutting-edge development?
- re-thc 9 hours ago
  
  > Kimi 2.6 is a 1 trillion total / 32B active parameter model that's something comparable to Sonnet.
  No it's not. On some rigged paper maybe. Some such benchmarks say all models group together, which they clearly do not.
  > Sonnet's API pricing is $5 in, $15 out per million tokens. Deepinfra serves Kimi at $0.75 in, $3.50 out, and about the same at openrouter. So you're looking at a 4-7x multiple that Anthropic is charging compared to market rates that any plebe can get with a credit card.
  That's not saying much. You can get "cloud" at AWS and you can get a VPS. There is likely a 10x difference. It's not "same". Whilst AWS costs more they also don't have 7x margins similarly.