← Back to context

Comment by joefourier

1 day ago

Who is going to buy a $4299 M5 Max MBP with 64GB of RAM just to run Gemma 4 31b? Firstly you don't need 64GB for that model. Secondly if you want a machine that sits in the corner and does nothing but LLM inference, you don't buy a MacBook Pro, you buy some GPUs which are going to cost you a fraction of that (~$1k for ~64GB of VRAM is possible). The people buying Apple Silicon for inference general aim for the Mac Studios with enormous amounts of RAM (128-512GB), to run very large models.

The idea is obviously to be running the LLM on your work laptop. As a developer I'd need a laptop with 24GB of RAM for work anyway, and 48GB, which is enough for a very good quant of Gemini, is just $400 extra.

24GB GPUs are $700-2500. Please show me the 64GB GPU for $1k.

  • Not a single new 64GB GPU, but multiple used GPUs.

    They’ve significantly increased in price (so much for hardware depreciation…) but you can still get a modded 22GB 2080 ti for $320, or a Mi50 32GB for ~$450 each (used to be $150 a few months ago, alas), or a Mi50 16GB or <$200 but you’d need to stack 4 of them.

    There’s also some more exotic configurations but those are probably the simplest options. You won’t get the performance of an RTX Pro 6000 Blackwell of course, and the power consumption will be pretty high so it’s only worth it if you have cheap electricity. But it is possible.

> Firstly you don't need 64GB for that model.

You might need that to run it with a longer context, KV cache size is a known issue with that model series.

> Gemma 4 31b? Firstly you don't need 64GB for that model.

You don't? It for sure doesn't run on my 32 GB M2 MAX.

  • What quant? You should have no problem running it at Q4 with 256K context, Q5 or Q6 even although maybe not at full context. I can run Q4 on a 4090 with just 24GB VRAM.