Comment by pama
3 days ago
I agree. Your estimate for (2), about 0.0022 kWh, corresponds to about a sixth of the charge of an iPhone 15 pro and would take longer than ten minutes on the phone, even at max power draw. It feels about right for the amount of energy/compute of a large modern MoE loading large pages of several 10k tokens. For example this tech (couple month old) could input 52.3k tokens per second to a 672B parameter model, per H100 node instance, which probably burns about 6–8kW while doing it. The new B200s should be about 2x to 3x more energy efficient, but your point still holds within an order of magnitude.
No comments yet
Contribute on Hacker News ↗