Comment by Denzel

7 days ago

Thanks for sharing!

It's worthwhile to note that https://github.com/deepseek-ai/open-infra-index/blob/main/20... shows cost vs. theoretical income. They don't show 80% gross margins and there's probably a reason they don't share their actual gross margin.

OpenAI is the easiest counterexample that proves inference is subsidized right now. They've taken $50B in investment; surpassed 400M WAUs (https://www.reuters.com/technology/artificial-intelligence/o...); lost $5B on $4B in revenue for 2024 (https://finance.yahoo.com/news/openai-thinks-revenue-more-tr...); and project they won't be cash-flow positive until 2029.

Prices would be significantly higher if OpenAI was priced for unit profitability right now.

As for the mega-conglomerates (Google, Meta, Microsoft), GenAI is a loss leader to build platform power. GenAI doesn't need to be unit profitable, it just needs to attract and retain people on their platform, ie you need a Google Cloud account to use Gemini API.

Thanks,

I believe the API prices are not subsidized, and there's an entire section devoted to that. To recap:

1) pure compute providers (rather than companies providing both the model and the compute) can't really gain anything from subsidizing. That market is already commoditized and supply-limited.

2) there is no value to gaining paid API market share -- the market share isn't sticky, and there's no benefit to just getting more usage since the terms of service for all the serious providers promise that the data won't be used for training.

3) we have data from a frontier lab on what the economics of their paid API inference are (but not the economics of other types of usage)

So the API prices set a ceiling on what the actual cost of inference can be. And that ceiling is very low relative to the prices of a comparable (but not identical) non-AI product category.

That's a very distinct case from free APIs and consumer products. The former is being given out for no cost in exchange for data, the latter for data and sticky market share. So unlike paid APIs, the incentives are there.

But given the cost structure of paid APIs, we can tell that it would be trivial for the consumer products to be profitably monetized with ads. They've got a ton of users, and the way users interact with their main product would be almost perfect for advertising.

The reason OpenAI is not making a profit isn't that inference is expensive. It's that they're choosing not to monetize like 95% of their users, despite the unit economics being very lucrative in principle. They're making a loss because for now they can, and for now the only goal of their consumer business is to maximize their growth and consumer mindshare.

If OpenAI needed to make a profit, they would not raise their prices on things being paid for. They'd just need to extract a very modest revenue from their unpaid users. (It's 500M unpaid users. To make $5B/year in revenue from them, you'd need just a $1 ARPU. That's an order of magnitude below what's realistic. Hell, that's lower than the famously hard to monetize Reddit's global ARPU.)

  • Yes, I read your entire article and that section, hence my response. :)

    1) Help me understand what you mean by “pure compute providers” here. Who are the pure compute providers and what are their financials including pricing?

    2) I already responded to this - platform power is one compelling value gained from paid API market share.

    3) If the frontier lab you’re talking about is DeepSeek, I’ve already responded to this as well, and you didn’t even concede the point that the 80% margin you cited is inaccurate given that it’s based on a “theoretical income”.

    • 1) Any companies that host APIs using open-weights models (LLama, Gemma, Deepseek, etc) in exchange for money. There's a lot of them around, at different scales and different parts of a hosting provider's lifecycle. Check for example the Openrouter page for any open-weights model for hosters of that model with price data.

      2) (API) platform power having no value in this space has been demonstrated repeatedly. There are no network effects, because you can't use the user data to improve models. There is no lock-in, as the models are easy to substitute due to how incredibly generic the interface is. There is no loyalty, the users will jump ship instantly when better models are released. There is no purchasing power from having more scale, the primary supplier (Nvidia) isn't giving volume discounts and is actually giving preferential allocations to smaller hosting providers to fragment the market as much as possible.

      Did you have some other form of platform power in mind?

      3) I did not concede that point because I don't think it's relevant. They provide the exact data for their R1 inference economics:

      - The cost per node: a 8*H800 node costs $16/hour=$0.0045/s to run (rental price, so that covers capex + opex).

      - The throughput per node: Given their traffic mix, a single node will process 75k/s input tokens and generate 15k/s output tokens.

      - Pricing ($0.35/1M input when weighing for cache hit/miss, $2.2/1M output)

      - From which it follows that the per-node revenue is $0.35/(1M/75k/s) = $0.026/s for input, and $2.2/(1M/15k/s)=$0.033/s for output. That's $0.06/s in revenue, substantially higher than the cost of revenue.

      Like, that just is what the economics of paid R1 inference are (there being V3 in the mix doesn't matter, they're the same parameter count). Inference is really, really cheap both in absolute cost/token terms and relative to the prices people are willing to pay.

      Their aggregate margins are different, and we don't know how different, because here too they choose to also provide free service with no ads. But that too is a choice. If they just stopped doing that and rented fewer GPUs, their margins would be very lucrative. (Not as high as the computation suggests since the unpaid traffic allows them to batch more efficiently, but hat's not going to make a 5x difference.)

      But fair enough, it might be cleaner to use the straight cost per token data rather than add the indirection of margins. Either way, it seems clear that API pricing is not subsidized.