Comment by danieldk

6 months ago

Traditional word embeddings (like word2vec) were trained using logistic regression. So probably the closest would be σ(u.v), which is of course nicely bounded.

(The catch is that during training logistic regression is done on the word and context vectors, but they have a high degree of similarity. People would even sum the context vectors and word vectors or train with word and context vectors being the same vectors without much loss.)