Comment by BoorishBears
1 day ago
I do? I spend a ton of time post-training models for creative tasks.
The effects of model quantization are usually qualified in terms of performance on benchmaxxed tasks with strong logit probabilities, temp 0, and a "right" answer the model has to pick. Or even worse they'll be measured on metrics that don't map to anything except themselves like perplexity (https://arxiv.org/pdf/2407.09141)
I agree Q8 is strong but I also think the effects of quantization are constantly being underappreciated. People are often talking about how these models perform while fundamentally using 10+ variants of a single model with distinct performance profiles.
Even knowing the bits per weight used isn't enough to know how exactly a given quant method is affecting the model: https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-ggufs
If you've trained your own models you would be aware of quantization aware training.
"Nobody really cares if it meets a strict definition of lossless" != "quantization can be done haphazardly."
If you're trying to really snarkily refer to the article on Dynamic Quants 2.0 and how carefully developed they were, they're comparing their quants to the methodology 99.99% quants out there use.
The problem is not that people are making quants "haphazardly", it's that people keep parroting that various quants are "practically lossless" when they actually have absolutely no clue how lossy they are given how application specific the concept is for something as multidimensional as an LLM.
The moment anyone tries a little harder to quantify how lossy they are, we repeatedly find that the answer is "not any reasonably definition of lossless". Even in their example where Q4 is <1% away in MMLU 5-shot is probably massively helped by a calibration dataset that maps to MMLU-style tasks really well, just like constantly using WikiText massively helps models that were trained on... tons of text from Wikipedia.
So unless you're doing your own calibrated quantization with your own dataset (which is not impossible, but also not near common), even their "non-haphazard" method could have a noticeable impact on performance.
Wasn't referring to that.
You are saying that people are using quantized models haphazardly and talking about them haphazardly. I'll grant it's not the exact same thing as making them haphazardly, but I think you took the point.
The terms shouldn't be used here. They aren't helpful. You are either getting good results or you are not. It shouldn't be treated differently from further training on dataset d. The weights changed - how much better or worse at task Y did it just get?
2 replies →