← Back to context

Comment by oceanplexian

2 days ago

> Their pricing models are simply not sustainable. I hope everyone realizes that the current LLMs are subsidized, like your Seamless and Uber was in the early days.

If you run these models at home it's easy to see how this is totally untrue.

You can build a pretty competent machine that will run Kimi or Deepseek for $10-20k and generate an unlimited amount of tokens all day long (I did a budget version with an Epyc machine for about $4k). Amortize that over a couple years, and it's cheaper than most people spend on a car payment. The pricing is sustainable, and that's ignoring the fact that these big model providers are operating on economies of scale, they're able to parallelize the GPUs and pack in requests much more efficiently.

> run these models at home

Damn what kind of home do you live in, a data center? Teasing aside maybe a slightly better benchmark is what sufficiently acceptable model (which is not objective but one can rely on arguable benchmarks) you can run via an infrastructure that is NOT subsidized. That might include cloud providers e.g. OVH or "neo" clouds e.g. HF but honestly that's tricky to evaluate as they tend to all have pure players (OpenAI, Anthropic, etc) or owners (Microsoft, NVIDIA, etc) as investors.

Ignores the cost of model training, R&D, managing the data centers and more. OpenAI etc regularly admit that all their products lose money. Not to mention the fact that it isn't enough to cover their costs, they have to pay back all those investors while actually generating a profit at some point in the future.

Uhm, you actually just proved their point if you run the numbers.

For simplicity’s sake we’ll assume DeepSeek 671B on 2 RTX 5090 running at 2 kW full utilization.

In 3 years you’ve paid $30k total: $20k for system + $10k in electric @ $0.20/kWh

The model generates 500M-1B tokens total over 3 years @ 5-10 tokens/sec. Understand that’s total throughput for reasoning and output tokens.

You’re paying $30-$60/Mtok - more than both Opus 4.5 and GPT-5.2, for less performance and less features.

And like the other commenters point out, this doesn’t even factor in the extra DC costs when scaling it up for consumers, nor the costs to train the model.

Of course, you can play around with parameters of the cost model, but this serves to illustrate it’s not so clear cut whether the current AI service providers are profitable or not.

  • 5 to 10 tokens per second is bungus tier rates.

    https://developer.nvidia.com/blog/nvidia-blackwell-delivers-...

    NVIDIAs 8xB200 gets you 30ktps on Deepseek 671B at maximum utilization thats 1 trillion tokens per year. At a dollar per million tokens that's $1 million.

    The hardware costs around $500k.

    Now ideal throughput is unlikely, so let's say your get half that. It's still 500B tokens per year.

    Gemini 3 Flash is like $3/million tokens and I assume it's a fair bit bigger, maybe 1 to 2T parameters. I can sort of see how you can get this to work with margins as the AI companies repeated assert.

    • Cool, that potential 5x cost improvement just got delivered this year. A company can continue running the previous generation until EOL, or take a hit by writing off the residual value - either way they’ll have a mixed cost model that puts their token cost somewhere in the middle between previous and current gens.

      Also, you’re missing material capex and opex costs from a DC perspective. Certain inputs exhibit diseconomies of scale when your demand outstrips market capacity. You do notice electricity cost is rising and companies are chomping at the bit to build out more power plants, right?

      Again, I ran the numbers for simplicity’s sake to show it’s not clear cut that these models are profitable. “I can sort of see how you can get this to work” agrees with exactly what I said: it’s unclear, certainly not a slam dunk.

      Especially when you factor in all the other real-world costs.

      We’ll find out soon enough.

      1 reply →

> Amortize that over a couple years, and it's cheaper than most people spend on a car payment.

I'm not parsing that: do you mean that the monthly cost of running your own 24x7 is less than the monthly cost of a car payment?

Whether true or false, I don't get how that is relevant to proving either that the current LLMs are not subsidised, or proving that they are.

  • If true it means there's a lower bound that is profitable at least taking into account current apparent purchasing costs and energy consumption.