Comment by spindump8930
10 months ago
Yes, it could be stacked on quants. It might be that quantized activations already are more "dense" and so they can't be compressed as much (from 16 -> ~11 bits), but certainly possible.
10 months ago
Yes, it could be stacked on quants. It might be that quantized activations already are more "dense" and so they can't be compressed as much (from 16 -> ~11 bits), but certainly possible.
I read it similarly - that this is a specific attribute of bfloat16, so the quants folks tend to run on local hardware don't have the same inefficiency to exploit