← Back to context

Comment by godelski

2 days ago

  data + plural = number
  data - plural = research
  king - crown = (didn't work... crown gets circled in red)
  king - princess = emperor
  king - queen = kingdom
  queen - king = worker
  king + queen = queen + king = kingdom
  boy + age = (didn't work... boy gets circled in red)
  man - age = woman
  woman - age = newswoman
  woman + age = adult female body (tied with man)
  girl + age = female child
  girl + old = female child

The other suggestions are pretty similar to the results I got in most cases. But I think this helps illustrate the curse of dimensionality (i.e. distances are ill-defined in high dimensional spaces). This is still quite an unsolved problem and seems a pretty critical one to resolve that doesn't get enough attention.

For fun, I pasted these into ChatGPT o4-mini-high and asked it for an opinion:

   data + plural    = datasets
   data - plural    = datum
   king - crown     = ruler
   king - princess  = man
   king - queen     = prince
   queen - king     = woman
   king + queen     = royalty
   boy + age        = man
   man - age        = boy
   woman - age      = girl
   woman + age      = elderly woman
   girl + age       = woman
   girl + old       = grandmother

The results are surprisingly good, I don't think I could've done better as a human. But keep in mind that this doesn't do embedding math like OP! Although it does show how generic LLMs can solve some tasks better than traditional NLP.

The prompt I used:

> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:

  • This is an LLM approximating a semantic calculator, based solely on trained-in knowledge of what that is and probably a good amount of sample output, yet somehow beating the results of a "real" semantic calculator. That's crazy!

    The more I think about it the less surprised I am, but my initial thoughts were quite simply "now way" - surely an approximation of an NLP model made by another NLP model can't beat the original, but the LLM training process (and data volume) is just so much more powerful I guess...

  • I hate to be pedantic, but the llm is definitely doing embedding math. In fact that’s all it does.

    • Sure! Although I think we both agree that the way those embeddings are transformed is significantly different ;)

      (what I meant to say is that it doesn't do embedding math "LIKE" the OP — not that it doesn't do embedding math at all.)

      1 reply →

  •   > The results are surprisingly good, I don't think I could've done better as a human
    

    I'm actually surprised that the performance is so poor and would expect a human to do much better. The GPT model has embedding PLUS a whole transformer model that can untangle the embedded structure.

    To clarify some of the issues:

      data is both singular and plural, being a mass noun[0,1]. Datum is something you'll find in the dictionary, but not common in use[2]. The dictionary lags actual definitions. I mean words only mean what we collectively agree they mean (dictionary definitely helps with that but we also invent words all the time -- i.e. slang). I see how this one could trick up a human, feeling the need to change the output and would likely consult a dictionary but I don't think that's a fair comparison here as LLMs don't have these same biases.
    
      King - crown really seems like it should be something like "man" or "person". The crown is the manifestation of the ruling power. We still use phrases like "heavy is the head that wears the crown" in reference to general leaders, not just monarchs.
    
      king - princess I honestly don't know what to expect. Man is technically gender neutral so I'll take this one.
    
      king - queen I would expect similar outputs to the previous one. Don't quite agree here.
    
      queen - king I get why is removing royalty but given the previous (two) results I think is showing a weird gender bias. Remember that queen is something like (woman + crown) and king is akin to (man + crown). So subtracting should be woman - man. 
    
      The others I agree with. These were actually done because I was quite surprised at the results and was thinking about the aforementioned gender bias.
    
      > But keep in mind that this doesn't do embedding math like OP!
    

    I think you are misunderstanding the architecture of these models. The embedding sub-network is the translation of text to numeric tokens. You'll find mention of the embedding sub-networks in both the GPT3[3] and GPT4 papers. Though they are given lower importance than other works. While much smaller than the main network, don't forget that embedding networks are still quite large. For the smaller models they constitute a significant part of the total parameter count[4]

    After the embedding sub-network is your main transformer network. The purpose of this network is to perform embedding math! It is just that the goal is to do significantly more complicated math. Remember, these are learnable mappings (see Optimal Transport). We're just breaking it down into their two main intermediate mappings. But the embeddings still end up being a bottleneck. It is your literal gateway from words to numbers.

    [0] https://en.wikipedia.org/wiki/Mass_noun

    [1] https://www.merriam-webster.com/dictionary/data

    [2] https://www.sciotoanalysis.com/news/2023/1/18/this-data-or-t...

    [3] https://arxiv.org/abs/2005.14165

    [4] https://arxiv.org/abs/2303.08774

    [4] https://www.lesswrong.com/posts/3duR8CrvcHywrnhLo/how-does-g...

    • You are being unnecessarily cynical. These are all subjective. I thought "datum" and "datasets" was quite clever, and while I would've chosen "man" for "king - crown" myself, I actually find "ruler" a better solution after seeing it. But each to their own.

      The rant about network architecture misses my point, which is that an LLM does not just do a linear transformation and a similarity search. Sure, in the most abstract sense it still just computes an output embedding from two input embeddings, but only in a very distant, pedantic way. (Actually, to be VERY pedantic, that would not even be true, because ChatGPT's tokenizer embeds tokens, not words. The in- and output of the model is more than just the semantic embedding of words; using two different but semantically equivalent words may result in different outputs with a transformer LLM, but not in a word semantics model.)

      I just thought it was cool that ChatGPT is so good at it.

      1 reply →

    • "King-crown=ruler" is IMO absolutely apt. Arguing that "crown" can be used metaphorically is a bit disingenuous because first, it's very rarely applied to non-monarchs, and is a very physical, concrete symbol of power that separates monarchs from other rulers.

      "King-princess=man" can be thought to subtract the "royalty" part of "king"; "man" is just as good an answer as any else.

      "King-queen=prince" I'd think of as subtracting "ruler" from "king", leaving a male non-ruling member of royalty. "gender-unspecified non-ruling royal" would be even better, but there's no word for that in English.

      2 replies →

    • The specific cherry-picked examples from GP make sense to me.

         data + plural    = datasets 
         data - plural    = datum
      

      If +/- plural can be taken to mean "make explicitly plural or singular", then this roughly works.

         king - crown     = ruler
      

      Rearrange (because embeddings are just vector math), and you get "king = ruler + crown". Yes, a king is a ruler who has a crown.

         king - princess  = man
      

      This isn't great, I'll grant, but there are many YA novels where someone becomes king (eventually) through marriage to a princess, or there is intrigue for the princess's hand for reasons of kingly succession, so "king = man + princess" roughly works.

         king - queen     = prince
         queen - king     = woman
      

      I agree it's hard to make sense of "king - queen = prince". "A queen is a woman king" is often how queens are described to young children. In Chinese, it's actually the literal breakdown of 女王. I also agree there's a gender bias, but also literally everything about LLMs and various AI trained on large human-generated data encodes the bias of how we actually use language and thought patterns. It's one of the big concerns of those in the civil liberties space. Search "llm discrimination" or similar for more on this.

      Playing around with age/time related gives a lot of interesting results:

          adult + age = adulthood
          child + age = female child
          year + age = chronological age
          time + year = day
          child + old = today
          adult - old = adult body
          adult - age = powerhouse
          adult - year = man
      

      I think a lot of words are hard to distill into a single embedding. A word may embed a number of conceptually distinct definitions, but my (incomplete) understanding of embeddings is that they are not context-sensitive, right? So averaging those distinct definitions through 1 label is probably fraught with problems when trying to do meaningful vector math with them that context/attention are able to help with.

      [EDIT:formatting is hard without preview]

  • ...welcome to ChatGPT, everyone! If you've been asleep since...2022?

    (some might say all an LLM does is embeddings :)

Distance is extremely well defined in high dimensional spaces. That isn't the problem.

  • Would you care to elaborate? To clarify, I mean that variance reduces as dimensionality increases

Yeah I did similar tests and got similar results.

Curious tool but not what I would call accurate.

I think you need to do A-B+C types? A+B or A-B wouldn’t make much sense when the magnitude changes

I got a bunch of red stuff also. I imagine the author cached embeddings for some words but not really all that many to save on credits. I gave it mermaid - woman and got merman, but when I tried to give it boar + woman - man or ram + woman - man, it turns out it has never heard of rams or boars.

Such results are inherently limited because a same word can have different meanings depending on context.

The role of the Attention Layer in LLMs is to give each token a better embedding by accounting for context.

Can you elaborate on what the unsolved problem you're referring to is?

  • Dealing with metrics in high dimensions. As you increase dimensionality the variance decreases, leading to indistinguishablity.

    You can get some help in high dimensions when you're more concerned with (clearly disjoint) clusters. But this is akin to doing a dimensional reduction, treating independent clusters as individual points. (Say we have set S which has disjoint subsets {S_0,...,S_n}, your new set is now {a_0,...,a_n}, where each a_i is an element representing all elements in S_i. Think like "set of sets") But you do not get help with interrelationships (i.e. d(s_x,s_y) \in S_i \forall x≠y) and I think you can gather that when clusters are not clearly disjoint then we're in the same situation as trying to differentiate inter-cluster.

    Understanding this can help you understand why these models (including LLMs) are good in broader concepts like differentiating between obvious things but struggle more in nuance. A good litmus test is to ask them about any subject you have good deep knowledge in. Essentially test yourself for Murray-Gelmann Amnesia. The things are designed for human preference. When they fail they're likely to fail without warning (i.e. in ways that are not so obvious)

Ah yes, 女 + 子 = girl but if combined in a kanji you get 好 = like.