← Back to context

Comment by mlepath

4 days ago

In ML everything is a tradeoff. The article strongly suggests using dot product similarity and it's a great metric in some situations, but dot product similarity has some issues too: - not normalized (unlike cosine simularity) - heavily favors large vectors - unbounded output - ...

Basically, do not carelessly use any similarity metric.

Traditional word embeddings (like word2vec) were trained using logistic regression. So probably the closest would be σ(u.v), which is of course nicely bounded.

(The catch is that during training logistic regression is done on the word and context vectors, but they have a high degree of similarity. People would even sum the context vectors and word vectors or train with word and context vectors being the same vectors without much loss.)