Comment by amitport
4 hours ago
I believe our claim at this point is more fundamental than just lack of citation.
The quantizer in TurboQuant is EDEN quantization (2021) applied to the KV-cache. It is neither a novel quantizer nor an improvement in quantization techniques.
In DRIVE/EDEN, we already introduced the version used in "TurboQuant"'s paper and suggested an optimal scale configurations which are better in both mse-minimizing and unbiased scenarios.
Wow, yes - you are completely correct (read through the note in detail now).
Though, as your paper also notes, the quantizer values themselves aren't fundamentally novel to either paper. Lloyd Max scalar quantizers have been studied for a very, very long time. And the specific Lloyd Max values for the Gaussian input distribution have been obtained in many papers across signal processing and information theory.
Thanks for that!
It is worth noting that taking advantage of the post-rotation distribution was not actually done until DRIVE (2021), which was made possible via our proper scaling. Furthermore, applying a Lloyd-Max codebook post-rotation was introduced EDEN.
We consider these to be the foundational works in this regard.
> Thanks for that! It is worth noting that taking advantage of the post-rotation distribution
I again feel this claim is too strong. Rotations have been used in information theory/wireless communications for decades at this point, with appropriate scaling done at channel inputs/outputs to hit channel capacity. The signals then pass through the appropriate codebooks that take advantage of the post-rotated+whitened signal.
Our cellphones today are powered by such technology.
I agree with your claim when restricted to deep learning. But I do not agree with the broad characterization that taking advantage of post-rotation distributions was only first done in your work.
2 replies →