Comment by IshKebab
1 year ago
Yeah I'm pretty sure you could do this just with the classic word embeddings (king =queen + man - woman). Maybe it doesn't work as well as with a full LLM.
1 year ago
Yeah I'm pretty sure you could do this just with the classic word embeddings (king =queen + man - woman). Maybe it doesn't work as well as with a full LLM.
Addition won't work for things that depend on the order of operations. If salt + water is ocean and water + fire is steam, what's salt + water + fire? Is it salt + steam or ocean + fire?
Associativity and commutivity in vector addition doesn't translate well to semantic meaning. Extrapolating your example, it'd also mean:
I don't see why those should all be true. Intuitively, trying to satisfy O(N^2) semantic pairings with vectors that are optimised for a very specific and different numerical operation (cosine similarity) feels like something that won't work. I'd imagine errors get amplified with 3+ operands.
Isn't the reason for lack of associativity/commutivity is that you're doing operations (addition/subtraction) that have them, and then snapping the result to the closest one of fixed number of points in your output dictionary? The addition is fine, loss of information is in the final conversion.
There's definitely some lossy compression when you snap it to the nearest known vector: enumerating every word ever written in human history wouldn't even come close to the 2^(16*D) representable points for a D-dimensional float16 embedding vector. In fact, even adding two float16 values is a form of lossy compression for most additions.
But I'd be surprised if either of those were the primary reason. The words "sea" and "ocean" are different vectors but they'll be very close to each other. salt + water = sea and salt + water = ocean both sound correct to me so the problem is more about whether the v_salt + v_water can even get to the vicinity of either v_sea or v_ocean.
1 reply →