Comment by Aurornis

13 hours ago

There isn't a relationship between parameter size and energy use like that. You could run a 280B parameter model on a Raspberry Pi with a big SSD if you were so determined. The energy use would be small, but you would be waiting a very long time for your response.

Data center energy use isn't simple to calculate because servers are configured to process a lot of requests in parallel. You're not getting an entire GPU cluster to yourself while your request is being processed. Your tokens are being processed in parallel with a lot of other people's requests for efficiency.

This is why some providers can offer a fast mode: Your request gets routed to servers that are tuned to process fewer requests in parallel for a moderate speedup. They charge you more for it because they can't fit as many requests into that server.

1 comment

Aurornis

zozbot234 12 hours ago

You're thinking about power use, not energy. There are systems that can more directly minimize energy per operation at the cost of high latency but they look more like TPUs than Raspberry Pi's.