Comment by twoodfin

7 hours ago

From the limited perspective of software development, today’s models are well-worth their per-token cost.

This reads to me like Anthropic anticipating demand and making a commitment to acquire supply. Not unlike airlines committing to future jet fuel purchases, or Apple committing to future DRAM volume.

31 comments

twoodfin

an0malous 7 hours ago

> From the limited perspective of software development, today’s models are well-worth their per-token cost.

At the current price or real price? Anthropic said a $200 subscription can cost them $5000 so the real price could be anywhere from 10-30x the current price.

RealityVoid 6 hours ago
No, that is probably one of the worst cases they probably saw. Most likely the subscription inference cost is much lower than you expect. If you look at costs for similar open models they are much lower than what you get by buying from anthropic, so that is the real cost basis I expect.
It's likely Amazon is making a fucking killing though.
- SlinkyOnStairs 6 hours ago
  
  While $5000 is a lot, the people who rack up close or just over a thousand "API equivalent cost" are pretty common.
  > Most likely the subscription inference cost is much lower than you expect.
  This is probably not true because they'd be screaming it off every rooftop were that the case.
  Same deal with the API inference. Even the "profitable on inference" claim is sourced back to hearsay of informal statements made by OpenAI/Anthropic staff. No formal announcements, nothing remotely of the "You can trust what I'm saying, because if I'm lying the SEC will have my head" sort.
  Yet making such statements would be invaluable. If Anthropic can demonstrate profitability before OpenAI, they could poach most of the funding. There's no reason to keep it a company secret.
  And API inference is only part of the total costs, not even bringing in training and ongoing fine-tuning. If they're not even profitable on inference, how could they hope to be profitable overall.
  
  8 replies →
- PunchyHamster 6 hours ago
  
  The "worst case" is probably someone just using their $200 account limits. So yeah, real cost is probably close to that
kiratp 5 hours ago
At the full current retail API price.
Business buyers are paying API prices, not subscription
Disclosure: Work at Microsoft on AI
- an0malous 3 hours ago
  
  Are your API prices profitable?

svnt 5 hours ago

And receiving investment from their vendor in exchange? When this is done in established companies it is typically called a kickback and directed toward one person, but in this case the whole thing is so incestuous the kickback goes straight to the top.

twoodfin 5 hours ago
Is it crazy to imagine Anthropic can leverage short term cash flow now to build the models and products that will let them resell $100B in AWS infra with nice margins tomorrow?
If Amazon believes that story they’d be crazy not to invest.
- svnt 5 hours ago
  
  Yes I understand why the agreement exists, but that does not remove the circularity.

sandworm101 7 hours ago

But that per-token cost is a total joke. All these companies are fighting to build market share in some future dominated by one or two AI ecosystems. It is musical chairs until someone creates the one ring to rule them all. So they are charging token amounts just to claim revenue as they burn through investor dollars.

In short: per-token charges currently cover maybe 1% of the total costs in this field. To pay ongoing costs, and pay back investors, everyone will need to pay 100x or 1000x the current rates, likely for decades.

red_hare 5 hours ago
If that's true, it's very unsustainable.
Gemma-4 26B-A4B + M5 MacBook Pro + OpenCode isn't Claude Code _yet_, but it's good enough that if I were forced to use it I would be fine.
- jcgrillo 5 hours ago
  
  Yes, it's amazing how quickly so many tech companies have hitched their tooling to these big AI vendors seemingly without any thought towards whether they'll still exist a year or three or five from now. Insane behavior. To the (debatable!) extent that AI coding tools are useful at all wouldn't it be a hell of a lot smarter to self-host? At least that way you have some control over QoS, and a stable, predictable result... Or maybe nobody cares about that kind of thing anymore? What happened to basic business math in this industry?
  
  2 replies →
matrik 6 hours ago
I'm not sure this information is grounded, but I remember to have read somewhere the inference is indeed profitable. My personal experience is similar. Running 2x3090s draw 500-600W and you can locally run amazing models with a similar setup.
- sandworm101 5 hours ago
  
  Running the model isnt the cost. Watts per token is the math they show investors. You also have to be constantly training new models, which currently needs more compute than servicing the customer base. You have to biuld datacenters, and possibly powerplants to feed them. You have to carry debts. And you will need to buy new GPUs/ram every few years to remain competative. The total business is vastly different than simple gpu math.
  
  1 reply →
deaux 6 hours ago

> In short: per-token charges currently cover maybe 1% of the total costs in this field
There are plenty of seemingly informed people saying the exact opposite, so that's a lot of confidence you're talking with. I have a hard time believing it when we know what open weights models cost to run. And sure, there's training costs, but again many say inference costs are already above training costs.
twoodfin 6 hours ago

From the perspective of a deal like this, “total costs in the field” matter less than incremental cost per token served.
The unit economics for today’s frontier models should be great, and this suggests Anthropic believes they’ll get better.
postalrat 6 hours ago
In a decade the cost of compute will be a tiny fraction of what it costs now. Specialized hardware will exist that will be cheap and efficient.
- bitmasher9 5 hours ago
  
  The difference in the cost of compute between 2026 and 2036 won’t be nearly as large as the difference in the cost of compute between 2016 and 2026. Even at 2016 the slowdown in improvements was noticeable.
  We might see a one time bump in inference when we move off GPUs onto more limited and efficient dedicated hardware, but the sustained fast pace of improvements are far behind us.
  
  2 replies →