← Back to context

Comment by angoragoats

6 hours ago

> The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.

This setup will work for 100B models as well. And yes, the Mac will draw less power, but the Nvidia machine will be many times faster. So depending on your specific Mac and your specific Nvidia setup, the performance per watt will be in the same ballpark. And higher absolute performance is certainly a nice perk.

> You can certainly daisy chain several 3090's together but it doesn't work seamlessly.

Citation needed; there's no "daisy chaining" in the setup I describe, and low level libraries like pytorch as well as higher level tools like Ollama all seamlessly support multiple GPUs.

I think it's bad form to say "citation needed" when your original claim didn't include citations.

Regardless - there's a difference between training and inference. And pytorch doesn't magically make 5 gpus behave like 1 gpu.