Comment by visarga

2 years ago

During the Word2Vec era, I used to average a few word embeddings to get centroid embeddings. My observation was that the average embed was still close to all the original embeddings up to 5 concepts. I tested with similarity search. Can't pack too many distinct meanings into a single embed, but you can pack a few.

My only gripe with word embeds was that they were mixing synonymy and relatedness. Even worse, mixing up synonymy with antonymy - hot and cold are similar in a way, but also completely opposite.