Comment by mixtureoftakes

21 hours ago

7b mistral is quite outdated. On a 12gb 4070 you can run qwen 3.5 9b q4km or qwen 3.6 35b, the latter will be a lot smarter but also a lot slower due to ram offload.

Try both in lm studio, they really are surprisingly capable

4 comments

mixtureoftakes

ge96 20 hours ago

I have 80gb of ram but it's slow capped by i9 CPU or specific asus mobo sucks I think only 2400mhz despite being ddr4

Tried all the stuff bios, volting

macNchz 15 hours ago
Gemma 4 26B-A4B might be interesting to try on your machine. The latest optimizations make MoE models work pretty nicely on setups like that with a decent GPU and lots of slowish RAM. I have a 16gb GPU and 64gb of 3200mhz DDR4 and get 15-20 tokens/sec out of that model with zero finagling or tweaking. I’ve been very impressed by it, even having run just about every other open weight model that would fit on my machine over the last few years.
- ge96 15 hours ago
  
  that seems slow? 15-20, was expecting 50-60 like mistral although I have not measured that yet on my setup
  I've been asking other people but what do you use it for?
- aiscoming 10 hours ago
  
  [dead]