Comment by SubiculumCode
2 months ago
I am a curious amateur, so I may say something dumb. but: Suppose you take a number of smaller embedding models, and one more advanced embedding model. Suppose for a document, you convert each model's embeddings to their universal embedding representation and examine the universal embedding spaces.
On a per document basis, would the universal embeddings of the smaller models (less performant) cluster around the better model's universal embedding space, in a way suggestive that they are each targeting the 'true' embedding space, but with additional error/noise?
If so, can averaging the universal embeddings from a collection of smaller models effectively approximate the universal embedding space of the stronger model? Could you then use your "averaged universal embeddings" as a target to train a new embedding model?
No comments yet
Contribute on Hacker News ↗