← Back to context

Comment by TeMPOraL

1 year ago

Isn't the reason for lack of associativity/commutivity is that you're doing operations (addition/subtraction) that have them, and then snapping the result to the closest one of fixed number of points in your output dictionary? The addition is fine, loss of information is in the final conversion.

There's definitely some lossy compression when you snap it to the nearest known vector: enumerating every word ever written in human history wouldn't even come close to the 2^(16*D) representable points for a D-dimensional float16 embedding vector. In fact, even adding two float16 values is a form of lossy compression for most additions.

But I'd be surprised if either of those were the primary reason. The words "sea" and "ocean" are different vectors but they'll be very close to each other. salt + water = sea and salt + water = ocean both sound correct to me so the problem is more about whether the v_salt + v_water can even get to the vicinity of either v_sea or v_ocean.

  • If we constrain our selves to a pool of words of say Wikipedia entries, minutes names and maybe some other stuff, and use a "super node" like "addition" to kind of act as a math operation.. maybe this makes more sense?