A polynomial autoencoder beats PCA on transformer embeddings

3 days ago (ivanpleshkov.dev)

My understanding after scanning the code examples is the technique expands the dimensionality of each data point with a set consisting of the quadratic coefficients of its existing dimensions. I thought it sounded like kernel PCA.

I came here from a discussion about CS students who should not be bothered to set up email filters. How can they ever expect to be able to digest just the first paragraph in that article?

I'm just a casual LLM user, but your description of the anisotropy made me think about the recent work on KV cache quantization techniques such as TurboQuant where they apply a random rotation on each vector before quantizing, as I understood it precisely to make it more isotropic.

But for RAG that might be too much work per vector?

Author here — questions and pushback both welcome.

  • You should benchmark the retrieval speed of each method in terms of queries per second. I suspect that the gain in bandwidth you get from slightly better compression will be defeated by decompression being much more expensive.

  • In the article, you mention this approach requires no search over hyper-parameter, because the method comprises a closed-form solution with "simple" linear algebra. I agree with this, but do you not in think need to tune the L2-regularization strength? That would for me be a hyper-parameter you would need to do a CV over (or similarly).