Comment by endlessvoid94

10 months ago

I've found the local models useful for non-coding tasks, however the 8B parameter models so far have proven lacking enough for coding tasks that I'm waiting another few months for whatever the Moore's law equivalent of LLM power is to catch up. Until then, I'm sticking with Sonnet 3.7.

11 comments

endlessvoid94

walthamstow 10 months ago

If you have a 32GB Mac then you should be able to run up to 27B params, I have done so with Google's `gemma3:27b-it-qat`

endlessvoid94 10 months ago
Hm, I've got an M2 air w/ 24GB. Running the 27B model was crawling. Maybe I had something misconfigured.
- 100721 10 months ago
  
  No, that sounds right. 24GB isn’t enough to feasibly run 27B parameters. The rule of thumb is approximately 1GB of ram per billion parameters.
  Someone in another comment on this post mentioned using one of the micro models (Qwen 0.6B I think?) and having decent results. Maybe you can try that and then progressively move upwards?
  EDIT: “Queen” -> “Qwen”
  
  2 replies →
- redman25 10 months ago
  
  I think only 2/3 of ram is allocated to be available to the gpu, so like 14gb which is probably not enough to run even Q4 quant.
  
  1 reply →
- hadlock 10 months ago
  
  deepseek-r1:8b screams on my 12gb gpu. gemma3:12b-it-qat runs just fine, a little faster than I can read. Once you exceed GPU ram it offloads a lot of the model to the CPU and splitting between gpu and cpu is dramatically (80? 95%?) slower
alkh 10 months ago
How much RAM was it taking during inference?
- walthamstow 10 months ago
  
  15.4GB during inference according to Activity Monitor
  
  1 reply →