Comment by kingsleyopara
14 days ago
For zero-shot accuracy from Table 3:
* LLaMA 3 8B: baseline 72.26, 4-bit 71.31, 3-bit 62.79
* LLaMA 3 70B: baseline 79.51, 4-bit 78.06, 3-bit 74.68
These results seem comparable to modern quantization methods—for example, the ~4-bit results for smaller LLaMA models listed here: https://ai.meta.com/blog/meta-llama-quantized-lightweight-mo...
I don't see any comparable numbers on the page you linked. Seems to only have numbers for 1B and 3B parameter models. Comparisons to AWQ and OmniQuant in Table 3 seem quite favorable with SeedLM showing 10% - 50% better performance.
Also seems like the techniques may be possible to combine.
As a rule of thumb, the bigger the model is, the more graciously it degrades under quantisation. So you may assume performance loss for a 8B model would be lower than for a 3B model. (I know that doesn't make up for missing numbers in link, just fyi.)