← Back to context

Comment by Denzel

7 days ago

Yes, I read your entire article and that section, hence my response. :)

1) Help me understand what you mean by “pure compute providers” here. Who are the pure compute providers and what are their financials including pricing?

2) I already responded to this - platform power is one compelling value gained from paid API market share.

3) If the frontier lab you’re talking about is DeepSeek, I’ve already responded to this as well, and you didn’t even concede the point that the 80% margin you cited is inaccurate given that it’s based on a “theoretical income”.

1) Any companies that host APIs using open-weights models (LLama, Gemma, Deepseek, etc) in exchange for money. There's a lot of them around, at different scales and different parts of a hosting provider's lifecycle. Check for example the Openrouter page for any open-weights model for hosters of that model with price data.

2) (API) platform power having no value in this space has been demonstrated repeatedly. There are no network effects, because you can't use the user data to improve models. There is no lock-in, as the models are easy to substitute due to how incredibly generic the interface is. There is no loyalty, the users will jump ship instantly when better models are released. There is no purchasing power from having more scale, the primary supplier (Nvidia) isn't giving volume discounts and is actually giving preferential allocations to smaller hosting providers to fragment the market as much as possible.

Did you have some other form of platform power in mind?

3) I did not concede that point because I don't think it's relevant. They provide the exact data for their R1 inference economics:

- The cost per node: a 8*H800 node costs $16/hour=$0.0045/s to run (rental price, so that covers capex + opex).

- The throughput per node: Given their traffic mix, a single node will process 75k/s input tokens and generate 15k/s output tokens.

- Pricing ($0.35/1M input when weighing for cache hit/miss, $2.2/1M output)

- From which it follows that the per-node revenue is $0.35/(1M/75k/s) = $0.026/s for input, and $2.2/(1M/15k/s)=$0.033/s for output. That's $0.06/s in revenue, substantially higher than the cost of revenue.

Like, that just is what the economics of paid R1 inference are (there being V3 in the mix doesn't matter, they're the same parameter count). Inference is really, really cheap both in absolute cost/token terms and relative to the prices people are willing to pay.

Their aggregate margins are different, and we don't know how different, because here too they choose to also provide free service with no ads. But that too is a choice. If they just stopped doing that and rented fewer GPUs, their margins would be very lucrative. (Not as high as the computation suggests since the unpaid traffic allows them to batch more efficiently, but hat's not going to make a 5x difference.)

But fair enough, it might be cleaner to use the straight cost per token data rather than add the indirection of margins. Either way, it seems clear that API pricing is not subsidized.