Comment by visarga
2 years ago
During the Word2Vec era, I used to average a few word embeddings to get centroid embeddings. My observation was that the average embed was still close to all the original embeddings up to 5 concepts. I tested with similarity search. Can't pack too many distinct meanings into a single embed, but you can pack a few.
My only gripe with word embeds was that they were mixing synonymy and relatedness. Even worse, mixing up synonymy with antonymy - hot and cold are similar in a way, but also completely opposite.
You should try unembedding some text centroids, it should work well with vec2text.