Comment by menaerus
1 day ago
How did you arrive to $10,000 electricity costs figure?
8xH200 enclosed in DGX H200 system power draw is ~14kW in its peak (CTS) configuration/utilization. Over one year, and assuming maximum utilization, this is 123,480 kWh per single DGX H200 unit. We need 2x such units for 16xH200 system configuration under subject so it's 246,960 kWh/year. This is ~$25,000 at 10cts per kWh and ~$74,000 at 30cts per kWh. At ~1,110,000 1M batches this gives us: (1) ~$0.02 - $0.07 per 1M of energy cost and (2) ~$0.25 per 1M assuming the same HW depreciation rate. In total, this is ~$0.3 per 1M tokens.
Seems sustainable?
I used 700W per H200 = 11.2 per 16 GPUs. I didn't include CPU and rest of the rack. So yours is a better approximation.
One has to keep in mind that the benchmark that was done is synthetic. This makes sense because it makes it reproducible but real world usage may differ - i.e. by the amount of context and the number of concurrent users. Also there are use cases where smaller models or smaller quants will do.
The key take away for me for this type of back of the envelope calculation is to get a good idea where we stand long term, i.e. when VC money stops subsidizing.
So for me $0.3 per 1M tokens for a decent model looks pretty good too. Seeing that OpenAI API charges $21 per 1M tokens input and $168 output for GPT-5.2 pro I was wondering what the real sustainable pricing is.