← Back to context

Comment by SiempreViernes

1 day ago

And investors will leak such claims quickly enough that this reasoning cannot plausibly hide big secrets.

It's not a big secret. If you just do the math yourself, it's easy to compute that inference doesn't cost all that much. People just see all the capital investment going around and all the new data centers being built, see that it's spent on "AI", put two and two together and get a three, or "clearly serving AI requests costs an arm and a leg".

The 1 they were missing is that AI requires both training and inference, and training is by far the expensive part. And that in principle you can stop training at any point and keep using the models as they are. (But that means that if other companies keep improving their models, you'll be left behind...)

In contrast, inference is fairly cheap and all the providers have great margins on it. Eventually either investment in training stops having commensurate impact on model quality, and people stop doing that and instead concentrate on making inference faster and even more efficient. Or if that doesn't happen, things will get very weird very quickly.

  • The market already shows where it will go.

    If you want frontier model you will pay more for inference to essentially fund the expensive training.

    If you don’t need frontier model you will get dirt cheap inference, which eventually will approach the cost of electricity spent per token.

  • This is technically correct, but practically false.

    They can't stop training as then the AI's knowledge will become out-of-date very quickly. Their knowledge stops the day you stop training.

    • Yes it seems that this discussion that has sparked such controversy involves an already well defined concept in business.

      Net margin versus gross margin.

      Net shows profitability after extracting all expenses while gross only extracts the cost of the goods sold. Putting the model training costs into a one time fixed expense provides a much better gross margin.

      This is known as COGS reclassification or classification shifting and is a common tactic to mislead investors.

      This is why analysts look at Free Cash Flow Margin.

      WorldCom and MicroStrategy did this before the Dotcom Bubble imploded.

  • > If you just do the math yourself, it's easy to compute that inference doesn't cost all that much.

    Show us your work, then. If it's so easy to do, this should be a trivial request to accommodate, no?

    • Just look at large open weights models being served by inference providers.

      Kimi 2.6 is a 1 trillion total / 32B active parameter model that's something comparable to Sonnet. Sonnet's API pricing is $5 in, $15 out per million tokens. Deepinfra serves Kimi at $0.75 in, $3.50 out, and about the same at openrouter. So you're looking at a 4-7x multiple that Anthropic is charging compared to market rates that any plebe can get with a credit card.

      3 replies →