A polynomial autoencoder beats PCA on transformer embeddings

3 days ago (ivanpleshkov.dev)

7 comments

timvisee

My understanding after scanning the code examples is the technique expands the dimensionality of each data point with a set consisting of the quadratic coefficients of its existing dimensions. I thought it sounded like kernel PCA.

teiferer 1 hour ago

I came here from a discussion about CS students who should not be bothered to set up email filters. How can they ever expect to be able to digest just the first paragraph in that article?

magicalhippo 1 hour ago

FWIW I found it quite straight forward. But then I did have some linear algebra back at uni.
That said I do think it's a good habit to either write out abbreviations in full or link to say Wikipedia, eg for PCA[1]. It's a well-known tool but still if you come from a slightly different field it might not ring a bell.
[1]: https://en.wikipedia.org/wiki/Principal_component_analysis

magicalhippo 2 hours ago

I'm just a casual LLM user, but your description of the anisotropy made me think about the recent work on KV cache quantization techniques such as TurboQuant where they apply a random rotation on each vector before quantizing, as I understood it precisely to make it more isotropic.

But for RAG that might be too much work per vector?

pleshkov 3 days ago

Author here — questions and pushback both welcome.

yorwba 34 minutes ago

You should benchmark the retrieval speed of each method in terms of queries per second. I suspect that the gain in bandwidth you get from slightly better compression will be defeated by decompression being much more expensive.
Devilstro 35 minutes ago

In the article, you mention this approach requires no search over hyper-parameter, because the method comprises a closed-form solution with "simple" linear algebra. I agree with this, but do you not in think need to tune the L2-regularization strength? That would for me be a hyper-parameter you would need to do a CV over (or similarly).