Comment by simonw

20 hours ago

Interesting story. Here's what it says:

> According to a person familiar with the company’s internal analysis, Cursor estimated last year that a $200-per-month Claude Code subscription could use up to $2,000 in compute, suggesting significant subsidization by Anthropic. Today, that subsidization appears to be even more aggressive, with that $200 plan able to consume about $5,000 in compute, according to a different person who has seen analyses on the company’s compute spend patterns.

The load-bearing detail here is if that means $2,000 of internal server+electricity costs, or $2,000 if they were to charge at their API pricing instead of the subscription cost.

The latter is how I understand these things to work right now. If it's the former then yeah, Anthropic are losing a TON of money on those subscriptions.

That's the big question, and nobody really knows what the operating costs actually are right now.

  • Frankly, everyone in the industry knows. When people make these statements without additional clarity they always talk about API prices. You can look at the NVL72 specs and make estimates for electricity and ownership costs rather easily. Inference at data-center scale is dirt cheap, even with public codes using dynamo and sglang. The mystery is why the early misconceptions about inefficient inference persisted even after NVIDIA was very open about everything they did to help reduce costs dramatically in the last two years.

    • I imagine it's the lack of transparency. The costs are obviously coming down as people figure out how to tune both hardware and software. But there are costs other than just electricity as well. For example, chips do burn out, I recall reading that 2 to 3 years is roughly what you can expect under inference loads, so replacing chips is a non trivial operational cost.

      Also, as the costs of running this stuff come down, the incentive to rent models goes down with them. Running local models has the benefit that you get to keep your data local, you can tune them to do what you like, and you're not subject to model or price changes down the road. This makes self hosting appealing both to individuals and companies. Currently, the barrier is in needing significant resources to run the models, but companies are already increasingly doing that with open models. And local inference that regular people can run is becoming a possibility as well.

      While I'm sure there's always going to be a market for renting out models as a service, it may shrink significantly as the costs continue to come down.