Q8 quant is very minimal fall off in terms of KLD against the lab 16 bit. If you have the memory for BF16 KV-cache (which is usually easier to stomach) then the Q8 is very close. But even Q8 quant model with Q8 KV-cache is very close.
Smaller quants for the model start to fall off but more importantly, smaller KV-cache quants fall off much faster so avoid less than Q8 there.
Q8 quant is very minimal fall off in terms of KLD against the lab 16 bit. If you have the memory for BF16 KV-cache (which is usually easier to stomach) then the Q8 is very close. But even Q8 quant model with Q8 KV-cache is very close.
Smaller quants for the model start to fall off but more importantly, smaller KV-cache quants fall off much faster so avoid less than Q8 there.