Comment by dust42
1 day ago
I used 700W per H200 = 11.2 per 16 GPUs. I didn't include CPU and rest of the rack. So yours is a better approximation.
One has to keep in mind that the benchmark that was done is synthetic. This makes sense because it makes it reproducible but real world usage may differ - i.e. by the amount of context and the number of concurrent users. Also there are use cases where smaller models or smaller quants will do.
The key take away for me for this type of back of the envelope calculation is to get a good idea where we stand long term, i.e. when VC money stops subsidizing.
So for me $0.3 per 1M tokens for a decent model looks pretty good too. Seeing that OpenAI API charges $21 per 1M tokens input and $168 output for GPT-5.2 pro I was wondering what the real sustainable pricing is.
No comments yet
Contribute on Hacker News ↗