Comment by varispeed
1 day ago
Does it make any sense? I tried few models at 128GB and it's all pretty much rubbish. Yes they do give coherent answers, sometimes they are even correct, but most of the time it is just plain wrong. I find it massive waste of time.
Apparently there is a whole science behind running models. I have seen the instructions that unsloth publishes for their quants and depending on the model they'll tweak things like the temperature, top k, etc.
The size of the quantization you chose also makes a difference.
The GPU driver also plays an important role.
What was your approach? What software did you use to run the models?
I'm not sure how long ago you tried it, but look at Qwen 3.5 32b on a fast machine. Usually best to shut off thinking if you're not doing tool use.