Comment by angoragoats

9 hours ago

This has changed since Sam Altman started buying up all the chip supply, raising prices on memory, storage, and GPUs for everyone, but it used to be the case that you could build a PC that was both cheaper and faster than a Mac for LLM inference, with roughly equal performance per watt.

You would use multiple *90-series GPUs, throttled down in terms of power. Depending on the GPU, the sweet spot is between 225-350W, where for LLM workloads you only lose 5-10% of performance for a ~50% drop in power consumption.

Combined with a workstation (Xeon/Epyc) CPU with lots of PCIe, you can support 6-7 such GPUs (or more, depending on available power). This will blow away the fastest Mac studio, at a comparable performance per watt.

Again, a lot of this has changed, since GPUs and memory are so much more expensive now.

Macs are great for a simpler all in one box with high memory bandwidth and middling-to-decent GPU performance, but they are (or were) absolutely not "untouchable."

With 6-7 GPUs and EPYC cpu it will also cost 2-3x more than a Mac Studio.

  • I think OP’s point was that it would do more than 2-3x the workload, thus them stating “blow it out of the water” and specifying “performance-per-watt”.