Comment by jwr

16 hours ago

Hmm. 80B. These days I am on the lookout for new models in the 32B range, since that is what fits and runs comfortably on my MacBook Pro (M4, 64GB).

I use ollama every day for spam filtering: gemma3:27b works great, but I use gpt-oss:20b on a daily basis because it's so much faster and comparable in performance.

6 comments

jwr

jabart 5 hours ago

Can you talk more about how you are using ollama for spam filtering?

bigyabai 9 hours ago

The model is 80b parameters, but only 3b are activated during inference. I'm running the old 2507 Qwen3 30B model on my 8gb Nvidia card and get very usable performance.

coolspot 6 hours ago
Yes, but you don’t know which 3B parameters you will need, so you have to keep all 80B in your VRAM, or wait until correct 3B are loaded from NVMe->RAM->VRAM. And of course it could be different 3B for each next token.
- drozycki 6 hours ago
  
  The latest SSDs benchmark at 3GB/s and up. The marginal latency would be trivial compared to the inference time.

electroglyph 15 hours ago

it'll run great, it's an moe.