Comment by mongrelion
19 hours ago
Apparently there is a whole science behind running models. I have seen the instructions that unsloth publishes for their quants and depending on the model they'll tweak things like the temperature, top k, etc.
The size of the quantization you chose also makes a difference.
The GPU driver also plays an important role.
What was your approach? What software did you use to run the models?
No comments yet
Contribute on Hacker News ↗