Comment by yobbo
4 days ago
Embeddings represent more than P("found in the same context").
It is true that cosine similarity is unhelpful if you expect it to be a distance measure.
[0,0,1] and [0,1,0] are orthogonal (cosine 0) but have euclidean distance √2, and 1/3 of vector elements are identical.
It is better if embeddings encode also angles, absolute and relative distances in some meaningful way. Testing only cosine ignores all distances.
Modern embeddings lie on a hypersphere surface, making euclidean equal to cosine. And if they don't, I probably wouldn't want to use them.
True, on a hypersphere cosine and euclidean are equivalent.
But if random embeddings are gaussian, they are distributed on a "cloud" around the hypersphere, so they are not equal.