Comment by wisty
6 days ago
They can have a very even load if they use their nodes for training when the customer use is low, so that massively helps. If they have 3x as much hardware as they need to serve peak demand (even with throttling) this will cost a lot, unless they have a another use for lots of GPU.
Just illustrative guesses, not real numbers, I underestimate overheads here but anyway ...
Let's assume a $20k expert node can produce 500 tokens per second (15,000 per year). $5k a year for the machine per year. $5k overheads. 5 experts per token (so $50k to produce 15,000 megatokens with a 100% throughput). Say they charge up to $10 per million tokens ... yeah it's tight but I can see how it's doable.
Say they cost $100 per user per year. If it's $10 per million tokens (depends on the model) then they are budgeting 10 million tokens per user. That's like 100 books per year. The answer is that users probably don't use as much as the api would cost.
The real question is, how does it cost $10 per megatoken?
500 tokens per second per node is like 15,000 megatokens per year. So a 500 token node can bring in $150,000 per node.
Call it 5 live experts and a router. That's maybe $20k per expert per year. If it's a kilowatt power supply per expert, and $0.1 per kW power that's $1000 for power. The hardware is good for 4 years so $5k for that. Toss in overheads, and it's maybe $10k costs.
So at full capacity they can make $5 off $10 revenue. With uneven loads they make nothing, unless they have some optimisation and very good load balancing (if they can double the tokens per second then they make a decent profit).
No comments yet
Contribute on Hacker News ↗