Comment by jwr

18 hours ago

Not everybody might realize this, but this is a truly excellent and very impressive result. Most models on my M4 Max run at 150W consumption.

Power consumption numbers aren't useful for efficiency calculations without also considering the tokens per second for the same model and quantization.

I could write an engine that only uses 10W on your machine, but it wouldn't be meaningful if it was also 10X slower.

More power consumption is usually an indicator that the hardware is being fully utilized, all things equal (comparing GPU to GPU or CPU to CPU, not apples to oranges)