Comment by coder543
2 months ago
32B models are easy to run on 24GB of RAM at a 4-bit quant.
It sounds like you need to play with some of the existing 32B models with better documentation on how to run them if you're having trouble, but it is entirely plausible to run this on a laptop.
I can run Qwen2.5-Instruct-32B-q4_K_M at 22 tokens per second on just an RTX 3090.
My question was about running it unquantized. The author of the article didn't say how he ran it. If he quantized it then saying he ran it on a laptop is not a news.
I can't imagine why anyone would run it unquantized, but there are some laptops with the more than 70GB of RAM that would be required. It's not that it can't be done... it's just that quantizing to at least 8-bit seems to be standard practice these days, and DeepSeek has shown that it's even worth training at 8-bit resolution.
Maybe he has a 64GB laptop. Also he said he can run it, not that he actually tried it.