Comment by leopoldj

2 months ago

>it can run on my laptop

Has anyone run it on a laptop (unquantized)? Disk size of the 32B model appears to be 80GB. Update: I'm using a 40GB A100 GPU. Loading the model took 30GB vRAM. I asked a simple question "How many r in raspberry". After 5 minutes nothing got generated beyond the prompt. I'm not sure how the author ran this on a laptop.

32B models are easy to run on 24GB of RAM at a 4-bit quant.

It sounds like you need to play with some of the existing 32B models with better documentation on how to run them if you're having trouble, but it is entirely plausible to run this on a laptop.

I can run Qwen2.5-Instruct-32B-q4_K_M at 22 tokens per second on just an RTX 3090.

  • My question was about running it unquantized. The author of the article didn't say how he ran it. If he quantized it then saying he ran it on a laptop is not a news.

    • I can't imagine why anyone would run it unquantized, but there are some laptops with the more than 70GB of RAM that would be required. It's not that it can't be done... it's just that quantizing to at least 8-bit seems to be standard practice these days, and DeepSeek has shown that it's even worth training at 8-bit resolution.