Comment by jongjong
2 days ago
Interesting. All developers I know who tinkered around with embeddings and vector similarity scoring were instantly hooked. The efficiency of computing the embeddings once and then reusing as many times as needed, comparing the vectors with a cheap <30-line function is extremely appealing. Not to mention the indexing capabilities to make it work at scale.
IMO vector embedding is the most important innovation in computing of the last decade. There's something magical about it. These people deserve some kind of prize. The idea that you can reduce almost any intricate concept including whole paragraphs to a fixed-size vector which encapsulates its meaning and proximity to other concepts across a large number of dimensions is pure genius.
Vector embedding is not an invention of the last decade. Featurization in ML goes back to the 60s - even deep learning-based featurization is decades old at a minimum. Like everything else in ML this became much more useful with data and compute scale
Yup, when I was at MSFT 20 years ago they were already productizing vector embedding of documents and queries (LSI).
Interesting. Makes one think.
1 reply →
If you take the embedding for king, subtract the embedding for male, add the embedding for female, and lookup the closest embedding you get queen.
The fact that dot product addition can encode the concept of royalty and gender (among all other sorts) is kind of magic to me.
This was actually shown to not really work in practice.
I have seen this particular work example to work. You don't get the exact match but the closest one is indeed Queen.
16 replies →
Vector embeddings are slightly interesting because they come pre-trained with large amounts of data.
But similar ways to reduce huge numbers of dimensions to a much smaller set of "interesting" dimensions have been known for a long time.
Examples include principal component analysis/single value decomposition, which was the first big breakthrough in face recognition (in the early 90s), and also used in latent semantic indexing, the Netflix prize, and a large pile of other things. And the underlying technique was invented in 1901.
Dimensionality reduction is cool, and vector embedding is definitely an interesting way to do it (at significant computational cost).
Vector embeddings are so overhyped. They're decent as a secondary signal, but they're expensive to compute and fragile. BM25 based solutions are more robust and WAY lower latency, at the cost of some accuracy loss vs hybrid solutions. You can get the majority of the lift from hybrid solutions with ingest time semantic expansion/reverse hyde type input annotation with a sparse embedding BM25 at a fraction of the computational cost.
But it's much cheaper to compute than inference, and also you only have to compute once for any content and reuse multiple times.
The idea of reducing language to mere bits, in general, sounds like it would violate the Godel/Turing theorems about computability.