Comment by perfmode
4 days ago
> These vectors are quite long - text-embedding-3-large has up 3072 dimensions - to the point that we can truncate them at a minimal loss of quality.
Would it be beneficial to use dimensionality reduction instead of truncating? Or does “truncation” mean dimensionality reduction in this context?
The way that the embedding is done is using Matryoshka Representation Learning, truncating it allows to compress while losing as little meaning as possible. In some sense it's like dimensionality reduction.
An argument could be made truncation is a sort of random projection, though it probably depends on how the embedding was created, and a more textbook random projection is likely going to be more robust.