Comment by jsnell

7 months ago

Needing less memory for inference is the entire point of quantization. Saving the disk space or having a smaller download could not justify any level of quality degradation.

2 comments

jsnell

anoncareer0212 7 months ago

Small point of order:

> entire point...smaller download could not justify...

Q4_K_M has layers and layers of consensus and polling and surveying and A/B testing and benchmarking to show there's ~0 quality degradation. Built over a couple years.

acchow 7 months ago

> Q4_K_M has ~0 quality degradation
Llama 3.3 already shows a degradation from Q5 to Q4.
As compression improves over the years, the effects of even Q5 quantization will begin to appear