Comment by yowlingcat
6 days ago
Locally, I use a Mac Studio with a ton of VRAM and just accept the limitations of the Metal ecosystem, which is generally fine for the inference workloads I am consistently running locally (but I think would be a pain for a lot of people).
I can't see it making sense for training workloads if and when I get to them (which I'd put on the cloud). I have a box with a single 3090 to do CUDA dev if I need to but I haven't needed to do it that often. And frankly the Mac Studio has rough computational parity with a bit under a 3090 in terms of grunt, but with an order of magnitude more unified VRAM so it hits the mark for medium-ish MoE models I like to run locally as well as some of the diffusion inference workloads.
Anything that doesn't work great locally or which is throwaway (but needs to be fast) ends up getting thrown at the cloud. I pull it back to something I can run locally once I'm running it over and over again on a recurring basis.
No comments yet
Contribute on Hacker News ↗