Comment by kingsleyopara

3 months ago

For zero-shot accuracy from Table 3:

* LLaMA 3 8B: baseline 72.26, 4-bit 71.31, 3-bit 62.79

* LLaMA 3 70B: baseline 79.51, 4-bit 78.06, 3-bit 74.68

These results seem comparable to modern quantization methods—for example, the ~4-bit results for smaller LLaMA models listed here: https://ai.meta.com/blog/meta-llama-quantized-lightweight-mo...

2 comments

kingsleyopara

timschmidt 3 months ago

I don't see any comparable numbers on the page you linked. Seems to only have numbers for 1B and 3B parameter models. Comparisons to AWQ and OmniQuant in Table 3 seem quite favorable with SeedLM showing 10% - 50% better performance.

Also seems like the techniques may be possible to combine.

_0ffh 3 months ago

As a rule of thumb, the bigger the model is, the more graciously it degrades under quantisation. So you may assume performance loss for a 8B model would be lower than for a 3B model. (I know that doesn't make up for missing numbers in link, just fyi.)