Comment by elil17
8 hours ago
I would love to see real examples of what reduced quality means in practice. Are you able to recover a document from the vector in a human readable format? If so, what sort of changes come up?
I could imagine a scenario where differences tend to be more substantive than you'd expect because of how less frequent words with fine distinctions in meaning - the very words that make the document special - may be embedded in the vector space.
Most of the fine distinctions are already lost when a document is processed through a pile of linear algebra to turn it into a fixed-size list of floating-point numbers, as you can see from the NDCG@10. Vector search is not a tool for fine distinctions. It's a tool for reducing a large pile of documents to a smaller selection of candidates, which you can then check individually with some more expensive method.
The ndcg loss is minimal 90.26 -> 89.65. This means it maintains most of the quality.
this is the reason why we report ndcg and not recall. ndcg respects fine grained details so you get the an overview of how much details you are trading off since it would hurt the ranking.