← Back to context

Comment by originalvichy

6 hours ago

Where in the world are you getting the numbers for how much video streaming uses energy? I am quite sure that just as with LLMs, most of the energy goes into the initial encoding of the video, and nowadays any rational service encodes videos to several bitrates to avoid JIT transcoding.

Networking can’t take that much energy, unless perhaps we are talking about purely wireless networking with cell towers?

LLM Inference is still quite power-hungry, Video decoding with hardware acceleration is much more efficient.

But we can do some estimates, heck, we can even ask GPT for some numbers.

Say you want to do 30 minutes of video (h265) or 30 minutes of LLM inferencing on a generic consumer device, ignoring the source of the model or source of encoded video, you get about 4x difference:

  Energy usage for 30 minutes of H.265 decoding: ~15–20 Wh.
  Energy usage for 30 minutes of Llama3 inference: ~40–60 Wh.

This is optimised already, so a working hardware H.265 decoder is assumed, and for inferencing, something on the level of an RTX 3050, but can also be a TPU or NE.

While not the most scientific comparison, it's perhaps good to know that video decoding is practically always local, and for streaming services it will use whatever is available and might even switch codecs (i.e. AV1, H.265, H.264 depending on what is available, and what licenses are used). And if you have older hardware, some codecs won't even exist in hardware, to the point where you start doing software decoding (very inefficient).

AI inferencing is mostly remote (at least the heavy loads) in a datacenter because local availability of hardware is pretty hit and miss, models are pretty big and spinning one up every time you just wanted to ask something is not very user friendly. Because in a datacenter you tend to pay for amperage per rack, you spec your AI inferencing hardware to eat that power since you're not saving any money or hardware life when you don't use it. That means that efficiency is important (more use out of a rack) but scaling/idling isn't really that big of a deal (but it has slowly dawned on people that burning power 'because you can' is not really a great model). That AI inferencing in a datacenter is more power-hungry as a result, because they can, because it is faster, and that's what attracts users.

I would estimate that the local llama3 inferencing uses less power than when done in a datacenter, because there simply is less power available locally (try finding an end-user device that is used mass-market with enough power available, you won't; only small markets like gaming PCs and workstations will do).

  • "I would estimate that the local llama3 inferencing uses less power than when done in a datacenter, because there simply is less power available locally"

    Is this taking into account the fact that datacenter resources are shared?

    Llama 3 on my laptop may use less power, but it's serving just me.

    Llama 3 in a datacenter or more expensive, more power-hungry hardware is potentially serving hundreds or thousands of users.

  • 20 Wh for 30 minutes of hardware accelerated h265 decoding is an order of magnitude too high at any bitrate. Please cite your sources.

    • As I wrote in my reply, I don't have "sources".

      Pure decode excluding any other requirements is probably pretty low, but running a decoder isn't all you need. There's network, display, storage and RAM so your OS can run etc. There will probably be plenty of variation (brightness, environment, how you get your stream in since a 5G modem is probably going to be different energy-wise compared to WiFi or Ethernet), and if you have something like a decoder in the CPU or in the GPU and if that GPU is separate, more PCIe involvement etc. But we can still estimate:

      Hardware decoding (1080p video): ~5–15 W for the CPU/GPU

      Overall system power usage (screen, memory, etc.): ~25–45 W for a typical laptop.

      Duration (30 minutes): If we assume an average of 35 W total system power, the energy consumption is:

      Energy = 35W × 0.5h ours = 17.5 Wh

      We can do a similar one for inference, also recognising you'll have variations either way:

      CPU inference: ~50 W. GPU inference: ~80 W. Overall system power usage: ~70–120 W for a typical laptop during LLM inference.

      Duration (30 minutes): Assuming an average of 100 W total system power:

      Energy = 100W × 0.5 hours = 50Wh

      We could pretend that our own laptop is very good at some of these tasks, but we're not taking about the best possible outcome, we're talking about the fact that there is a difference between decoding a video stream and doing LLM inference, and the fact that that difference is big enough to make someone's point that video streaming is somehow 'worse' or 'as bad as' LLM usage moot. Because it's not. LLM training and LLM inference eats way more energy.

      Edit: looking at some random search engine results, you get a bunch of reddit posts with screenshots from people asking where the power consumption goes on their locally running LLM inferencing: https://www.reddit.com/r/LocalLLaMA/comments/17vr3uu/what_ex...

      It seems their local usage hovers around 100W. Other similar posts hover around the same, but it seems to be throttle based as other machines with faster chips also throttle around the same power target while delivering better performance. Most local models use a quantised model which is less resource-hungry, the cloud-hosted models tend to use much larger (and thus more hungry models).

      Edit2: looking at some real-world optimised decoding measurements, it appears you can decode VP9 and H.265 on 1 year old hardware below 200mW. So not even 1W. That would mean LLM inferencing is orders of magnitude more power hungry than video decoding. Either way: LLM power usage > Video Decode power usage, so the article trying to put them in the same boat is nonsense.

Luckily we don’t have to do such a calculation. All this energy use will be factored into cost which tells us which is using more resources.

  • Ah yes high tech, an industry where there's famously no weird distorting influence from VCs subsidizing unprofitable business models to grab market share.

    • It doesn’t matter that you are paying for, someone’s paying for it, and that economizing force always is putting pressure on.