Comment by segmondy
4 hours ago
I don't think most people realize that. Quality of tokens beats quantity of token. I always tell folks to go as high a quant as you can only go lower if you just don't have the memory capacity.
4 hours ago
I don't think most people realize that. Quality of tokens beats quantity of token. I always tell folks to go as high a quant as you can only go lower if you just don't have the memory capacity.
what do you mean with that, I’m not sure I understood what you said
AI models like gemma4 are available in different quant "sizes", think about it as an image available in various compression levels.
The best image is the largest, takes up the most memory when loading, and while it is large and looks the best, it uses up much of your system resources.
On the other end of the spectrum there is a smaller much more compressed version of that same image. It loads quickly, uses less resources, but is lacking detail and clarity of the original image.
AI models are similar in that fashion, and the parent poster is suggesting you use the largest version of the AI model your system can support, even if it runs a little slower than you like.
Thank you!
Better go for a less-quantized model even if it's slower than go for a faster, quantized one.
Thank you!