← Back to context

Comment by segmondy

1 day ago

if you can run Q8, go for it, always go for the best. matters a lot with vision models, never quantizie your kv cache, those always at f16.

you can always try evals and see if you have a q6 or q4 that can perform better than your q8. for smaller models i go q8. for bigger ones when i run out of memory I then go q6/q6/q4 and sometimes q3. i run deepseek/kimi-q4 for example.

I suggest for beginners to start with q8 so they can get the best quality and not be disappointed. it's simple to use q8 if you have the memory, choice fatigue and confusion comes in once you start trying to pick other quants...