Comment by lblock

19 days ago

Isn't the whole linear operations thing not really true and even less so now in embeddings from transformer based models? I remember reading a blog about this but cannot find it anymore. This is the closest thing i could find now: https://mikexcohen.substack.com/p/king-man-woman-queen-is-fa...

2 comments

lblock

wongarsu 19 days ago

I just tried it on qwen3-embedding:8b with a little vibe-coded 100 line script that does the obvious linear math and compares the result to the embeddings of a couple of candidate words using cosine similarity, and it did prefer the expected words. Same 22 candidates for both questions

king - man + woman ≈ queen (0.8510)

   Top similarity
   0.8510   queen
   0.8025   king
   0.7674   princess
   0.7424   woman
   0.7212   queen Elizabeth

Berlin - Germany + France ≈ Paris (0.8786)

    Top similarity
    0.8786   Paris
    0.8309   Berlin
    0.8057   France
    0.7824   London

Sure, 0.85 is not an exact match so things are not exactly linear, and if I dump an entire dictionary in there it might be worse, but the idea very much works

Edit: after running a 100k wordlist through qwen3-embedding:0.6b, the closest matches are:

    king  –  man  +  woman  ≈ queen (0.7873)
    berlin  –  germany  +  france  ≈ paris (0.9038)
    london  –  england  +  france  ≈ paris (0.9137)
    stronger  –  strong  +  weak  ≈ weaker (0.8531)
    stronger  –  strong  +  nation  ≈ country (0.8047)
    walking  –  walk  +  run  ≈ running (0.9098)

So clearly throwing a dictionary at it doesn't break it, the closest matches are still the expected ones. The next closest matches got a lot more interesting too, for example the four closest matches for london – england + france are (in order) paris, strasbourg, bordeaux, marseilles

mjhay 19 days ago

Very much so. It was a little less untrue with older word embedding models* but that kind of semantic linearity never was a thing in practice. Word embedding models try to embed semantically similar words close to each other, but that does not imply linearity at all.

*with transformer models, it is pretty much not even wrong.