Comment by tracker1
2 days ago
At what cost though? Most AI operations are losing money, using a lot of power, including massive infrastructure costs, not to mention the hardware costs to get going, and that isn't even covering the level of usage many/most want, and certainly aren't going to pay even $100s/month per person that it currently costs to operate.
This is a really basic way to look at unit economics of inference.
I did some napkin math on this.
32x H100s cost 'retail' rental prices about $2/hr. I would hope that the big AI companies get it cheaper than this at their scale.
These 32 H100s can probably do something on the order of >40,000 tok/s on a frontier scale model (~700B params) with proper batching. Potentially a lot more (I'd love to know if someone has some thoughts on this).
So that's $64/hr or just under $50k/month.
40k tok/s is a lot of usage, at least for non-agentic use cases. There is no way you are losing money on paid chatgpt users at $20/month on these.
You'd still break even supporting ~200 Claude Code-esque agentic users who were using it at full tilt 40% of the day at $200/month.
Now - this doesn't include training costs or staff costs, but on a pure 'opex' basis I don't think inference is anywhere near as unprofitable as people make out.
My thought is closer to the developer user who would want to have their codebase as part of the queries along with heavy use all day long... which is closer to my point that many users are less likely to spend hundreds a month, at least with the current level of results people get.
That said, you could be right, considering Claude max's price is $100/mo... but I'm not sure where that is in terms of typical, or top 5% usage and the monthly allowance/usage.