Comment by sandworm101
3 hours ago
But that per-token cost is a total joke. All these companies are fighting to build market share in some future dominated by one or two AI ecosystems. It is musical chairs until someone creates the one ring to rule them all. So they are charging token amounts just to claim revenue as they burn through investor dollars.
In short: per-token charges currently cover maybe 1% of the total costs in this field. To pay ongoing costs, and pay back investors, everyone will need to pay 100x or 1000x the current rates, likely for decades.
If that's true, it's very unsustainable.
Gemma-4 26B-A4B + M5 MacBook Pro + OpenCode isn't Claude Code _yet_, but it's good enough that if I were forced to use it I would be fine.
Yes, it's amazing how quickly so many tech companies have hitched their tooling to these big AI vendors seemingly without any thought towards whether they'll still exist a year or three or five from now. Insane behavior. To the (debatable!) extent that AI coding tools are useful at all wouldn't it be a hell of a lot smarter to self-host? At least that way you have some control over QoS, and a stable, predictable result... Or maybe nobody cares about that kind of thing anymore? What happened to basic business math in this industry?
The basic business math is (to start) software companies realizing that spending $10k, $20k, $50k (more ?) per year, per developer for current models at current token rates might not be particularly insane, given the value return.
Models are likely going to keep getting better, and as costs go down, demand is likely to rise faster.
I'm not sure this information is grounded, but I remember to have read somewhere the inference is indeed profitable. My personal experience is similar. Running 2x3090s draw 500-600W and you can locally run amazing models with a similar setup.
Running the model isnt the cost. Watts per token is the math they show investors. You also have to be constantly training new models, which currently needs more compute than servicing the customer base. You have to biuld datacenters, and possibly powerplants to feed them. You have to carry debts. And you will need to buy new GPUs/ram every few years to remain competative. The total business is vastly different than simple gpu math.
> In short: per-token charges currently cover maybe 1% of the total costs in this field
There are plenty of seemingly informed people saying the exact opposite, so that's a lot of confidence you're talking with. I have a hard time believing it when we know what open weights models cost to run. And sure, there's training costs, but again many say inference costs are already above training costs.
From the perspective of a deal like this, “total costs in the field” matter less than incremental cost per token served.
The unit economics for today’s frontier models should be great, and this suggests Anthropic believes they’ll get better.
In a decade the cost of compute will be a tiny fraction of what it costs now. Specialized hardware will exist that will be cheap and efficient.
The difference in the cost of compute between 2026 and 2036 won’t be nearly as large as the difference in the cost of compute between 2016 and 2026. Even at 2016 the slowdown in improvements was noticeable.
We might see a one time bump in inference when we move off GPUs onto more limited and efficient dedicated hardware, but the sustained fast pace of improvements are far behind us.
I'm predicting now that there is a clear use-case for this tech that work will (and has) accelerate specialized hardware, software, models, etc that will run much more efficiently in 10 years. So that the real token costs will be a fraction of what they are now.
Compute power improvement between 2016 and 2026 wasn't that impressive either. Moore's law is essentially dying.