Show HN: Semantic Calculator (king-man+woman=?)
5 months ago (calc.datova.ai)
I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.
For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.
The other suggestions are pretty similar to the results I got in most cases. But I think this helps illustrate the curse of dimensionality (i.e. distances are ill-defined in high dimensional spaces). This is still quite an unsolved problem and seems a pretty critical one to resolve that doesn't get enough attention.
For fun, I pasted these into ChatGPT o4-mini-high and asked it for an opinion:
The results are surprisingly good, I don't think I could've done better as a human. But keep in mind that this doesn't do embedding math like OP! Although it does show how generic LLMs can solve some tasks better than traditional NLP.
The prompt I used:
> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:
This is an LLM approximating a semantic calculator, based solely on trained-in knowledge of what that is and probably a good amount of sample output, yet somehow beating the results of a "real" semantic calculator. That's crazy!
The more I think about it the less surprised I am, but my initial thoughts were quite simply "now way" - surely an approximation of an NLP model made by another NLP model can't beat the original, but the LLM training process (and data volume) is just so much more powerful I guess...
2 replies →
I hate to be pedantic, but the llm is definitely doing embedding math. In fact that’s all it does.
2 replies →
I'm actually surprised that the performance is so poor and would expect a human to do much better. The GPT model has embedding PLUS a whole transformer model that can untangle the embedded structure.
To clarify some of the issues:
I think you are misunderstanding the architecture of these models. The embedding sub-network is the translation of text to numeric tokens. You'll find mention of the embedding sub-networks in both the GPT3[3] and GPT4 papers. Though they are given lower importance than other works. While much smaller than the main network, don't forget that embedding networks are still quite large. For the smaller models they constitute a significant part of the total parameter count[4]
After the embedding sub-network is your main transformer network. The purpose of this network is to perform embedding math! It is just that the goal is to do significantly more complicated math. Remember, these are learnable mappings (see Optimal Transport). We're just breaking it down into their two main intermediate mappings. But the embeddings still end up being a bottleneck. It is your literal gateway from words to numbers.
[0] https://en.wikipedia.org/wiki/Mass_noun
[1] https://www.merriam-webster.com/dictionary/data
[2] https://www.sciotoanalysis.com/news/2023/1/18/this-data-or-t...
[3] https://arxiv.org/abs/2005.14165
[4] https://arxiv.org/abs/2303.08774
[4] https://www.lesswrong.com/posts/3duR8CrvcHywrnhLo/how-does-g...
6 replies →
Can you do the same but each line is done in a seperate context?
...welcome to ChatGPT, everyone! If you've been asleep since...2022?
(some might say all an LLM does is embeddings :)
Distance is extremely well defined in high dimensional spaces. That isn't the problem.
Would you care to elaborate? To clarify, I mean that variance reduces as dimensionality increases
Yeah I did similar tests and got similar results.
Curious tool but not what I would call accurate.
I got a bunch of red stuff also. I imagine the author cached embeddings for some words but not really all that many to save on credits. I gave it mermaid - woman and got merman, but when I tried to give it boar + woman - man or ram + woman - man, it turns out it has never heard of rams or boars.
Can you elaborate on what the unsolved problem you're referring to is?
Dealing with metrics in high dimensions. As you increase dimensionality the variance decreases, leading to indistinguishablity.
You can get some help in high dimensions when you're more concerned with (clearly disjoint) clusters. But this is akin to doing a dimensional reduction, treating independent clusters as individual points. (Say we have set S which has disjoint subsets {S_0,...,S_n}, your new set is now {a_0,...,a_n}, where each a_i is an element representing all elements in S_i. Think like "set of sets") But you do not get help with interrelationships (i.e. d(s_x,s_y) \in S_i \forall x≠y) and I think you can gather that when clusters are not clearly disjoint then we're in the same situation as trying to differentiate inter-cluster.
Understanding this can help you understand why these models (including LLMs) are good in broader concepts like differentiating between obvious things but struggle more in nuance. A good litmus test is to ask them about any subject you have good deep knowledge in. Essentially test yourself for Murray-Gelmann Amnesia. The things are designed for human preference. When they fail they're likely to fail without warning (i.e. in ways that are not so obvious)
Such results are inherently limited because a same word can have different meanings depending on context.
The role of the Attention Layer in LLMs is to give each token a better embedding by accounting for context.
I think you need to do A-B+C types? A+B or A-B wouldn’t make much sense when the magnitude changes
hacker+news-startup = golfer
Ah yes, 女 + 子 = girl but if combined in a kanji you get 好 = like.
> king-man+woman=queen
Is the famous example everyone uses when talking about word vectors, but is it actually just very cherry picked?
I.e. are there a great number of other "meaningful" examples like this, or actually the majority of the time you end up with some kind of vaguely tangentially related word when adding and subtracting word vectors.
(Which seems to be what this tool is helping to illustrate, having briefly played with it, and looked at the other comments here.)
(Btw, not saying wordvecs / embeddings aren't extremely useful, just talking about this simplistic arithmetic)
I once saw an explanation which I can no longer find that what's really happening here is also partly "man" and "woman" are very similar vectors which nearly cancel each other out, and "king" is excluded from the result set to avoid returning identities, leaving "queen" as the closest next result. That's why you have to subtract and then add, and just doing single operations doesn't work very well. There's some semantic information preserved that might nudge it in the right direction but not as much as the naive algebra suggests, and you can't really add up a bunch of these high-dimensional vectors in a sensible way.
E.g. in this calculator "man - king + princess = woman", which doesn't make much sense. "airplane - engine", which has a potential sensible answer of "glider", instead "= Czechoslovakia". Go figure.
Well when it works out it is quite satisfying
India - Asia + Europe = Italy
Japan - Asia + Europe = Netherlands
China - Asia + Europe = Soviet-Union
Russia - Asia + Europe = European Russia
calculation + machine = computer
Interesting:
That means Bush = Ukraine+Putin-Europe-Lenin-purge.
However, the site gives Bush -4%, second best option (best is -2%, "fleet ballistic missile submarine", not sure what negative numbers mean).
1 reply →
democracy - vote = progressivism
I'll have to mediate on that.
1 reply →
I think it's worth keeping in mind that word2vec was specifically trained on semantic similarity. Most embedding APIs don't really give a lick about the semantic space
And, worse, most latent spaces are decidedly non-linear. And so arithmetic loses a lot of its meaning. (IIRC word2vec mostly avoided nonlinearity except for the loss function). Yes, the distance metric sort-of survives, but addition/multiplication are meaningless.
(This is also the reason choosing your embedding model is a hard-to-reverse technical decision - you can't just transform existing embeddings into a different latent space. A change means "reembed all")
I think it's slightly uncommon for the vectors to "line up" just right, but here are a few I tried:
actor - man + woman = actress
garden + person = gardener
rat - sewer + tree = squirrel
toe - leg + arm = digit
Also, as I just learned the other day, the result was never equal, just close to "queen" in the vector space.
And queen isn't even the closest.
5 replies →
I mean they are floating point vectors so
> is it actually just very cherry picked?
100%
Hmm, well I got
if that helps.
First off, this interface is very nice and a pleasure to use, congrats!
Are you using word2vec for these, or embeddings from another model?
I also wanted to add some flavor since it looks like many folks in this thread haven't seen something like this - it's been known since 2013 that we can do this (but it's great to remind folks especially with all the "modern" interest in NLP).
It's also known (in some circles!) that a lot of these vector arithmetic things need some tricks to really shine. For example, excluding the words already present in the query[1]. Others in this thread seem surprised at some of the biases present - there's also a long history of work on that [2,3].
[1] https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...
[2] https://arxiv.org/abs/1905.09866
[3] https://arxiv.org/abs/1903.03862
Thank you! I actually had a hard time finding prior work on this, so I appreciate the references.
The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.
It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).
(Question for anyone) how could I go about replicating this with Gemini Embedding? Generate and store an embedding for every word in the dictionary?
1 reply →
Neat! Reminds me of infinite craft
https://neal.fun/infinite-craft/
I went to look at infinite craft.
It provides a panel filled with slowly moving dots. Right of the panel, there are objects labeled "water", "fire", "wind", and "earth" that you can instantiate on the panel and drag around. As you drag them, the background dots, if nearby, will grow lines connecting to them. These lines are not persistent.
And that's it. Nothing ever happens, there are no interactions except for the lines that appear while you're holding the mouse down, and while there is notionally a help window listing the controls, the only controls are "select item", "delete item", and "duplicate item". There is also an "about" panel, which contains no information.
In the panel, you can drag one of the items (eg. Water) onto another one (eg. Earth), and it will create a new word (eg. Plant). It uses AI, so it goes very deep
4 replies →
Some of these make more sense than others (and bookshop is hilarious even if it's only the best answer by a small margin; no shade to bookshop owners).
I don't want to dump too many but I found
pretty funny and very hard to understand. All the other options are hyperspecific grasslike plants like meadow salsify.
My philosophical take on it is that natural language has many many more dimensions than we could hope to represent. Whenever you do dimension reduction you lose information.
dog - fur = Aegean civilization
This is super neat.
I built a game[0] along similar lines, inspired by infinite craft[1].
The idea is that you combine (or subtract) “elements” until you find the goal element.
I’ve had a lot of fun with it, but it often hits the same generated element. Maybe I should update it to use the second (third, etc.) choice, similar to your tool.
[0] https://alchemy.magicloops.app/
[1] https://neal.fun/infinite-craft/
I don't get it but I'm not sure I'm supposed to.
is pretty good IMO, it is a nice blend of the concepts in an intuitive manner. I don’t really get
But
Is kind of interesting; one definition of narcotic is
> a drug (such as opium or morphine) that in moderate doses dulls the senses, relieves pain, and induces profound sleep but in excessive doses causes stupor, coma, or convulsions
https://www.merriam-webster.com/dictionary/narcotic
So we can see some element of losing time in that type of drug. I guess? Maybe I’m anthropomorphizing a bit.
Does the system you’re querying ‘get it’? From the answers it doesn’t seem to understand these words or their relations. Once in a while it’ll hit on something that seems to make sense.
Here's a challenge: find something to subtract from "hammer" which does not result in a word that has "gun" as a substring. I've been unsuccessful so far.
The word "gun" itself seems to work. Package this as a game and you've got a pretty fun game on your hands :)
Doh why didn't I think of that
Gun related stuff works: bullet, holster, barrel
Other stuff that works: key, door, lock, smooth
Some words that result in "flintlock": violence, anger, swing, hit, impact
Well that's easy, subtract "gun" :P
hammer - keyboard = hammerhead
Makes no sense, admittedly!
- dulcimer and - zither are both in firmly in .*gun.* territory it seems..
Bullet
hammer - red = lock
hammer + man = adult male body (75%)
Close, that's addition
if I'm allowed only 1 something, I can't find anything either, if I'm allowed a few somethings, "hammer - wine - beer - red - child" will get you there. Guessing given that a gun has a hammer and is also a tool, it's too heavily linked in the small dataset.
As you might expect from a system with knowledge of word relations but without understanding or a model of the world, this generates gibberish which occasionally sounds interesting.
This might be helpful: I haven't implemented it in the UI, but from the API response you can see what the word definitions are, both for the input and the output. If the output has homographs, likeliness is split per definition, but the UI only shows the best one.
Also, if it gets buried in comments, proper nouns need to be capitalized (Paris-France+Germany).
I am planning on patching up the UI based on your feedback.
These are pretty good results. I messed around with a dumber and more naive version of this a few years ago[1], and it wasn't easy to get sensinble output most of the time.
[1]: https://github.com/GrantMoyer/word_alignment
I've always wondered if there's s way to find which vectors are most important in a model like this. The gender vector man-woman or woman-man is the one always used in examples, since English has many gendered terms, but I wonder if it's possible to generate these pairs given the data. Maybe to list all differences of pairs of vectors, and see if there are any clusters. I imagine some grammatical features would show up, like the plurality vector people-person, or the past tense vector walked-walk, but maybe there would be some that are surprisingly common but don't seem to map cleanly to an obvious concept.
Or maybe they would all be completely inscrutable and man-woman would be like the 50th strongest result.
Not what it's meant for, I guess, but it's not very strong at chemistry ;-)
It also has some other interesting outputs:
Reminds me of the very annoying word game https://contexto.me/en/
This is super fun. Offering the ranked matches makes it significantly more engaging than just showing the final result.
Interesting: parent + male = female (83%)
Can not personally find the connection here, was expecting father or something.
Though dad is in the list with lower confidence (77%).
High dimension vector is always hard to explain. This is an example.
There was a site like this a few years ago (before all the LLM stuff kicked off) that had this and other NLP functionality. Styling was grey and basic. That’s all I remember.
I’ve been unable to find it since. Does anyone know which site I’m thinking of?
I'm not sure this is old enough, but could you be referencing https://news.ycombinator.com/item?id=39205020?
Thanks, no it wasn't that, it was a basic HTML form.
A few favorites:
wine - beer = grape juice
beer - wine = bowling
astrology - astronomy + mathematics = arithmancy
What about starting with the result and finding set of words that when summed together give that result?
That could be seen as trying to find the true "meaning" of a word.
artificial intelligence - bullsh*t = computer science (34%)
This. I'm tired of so many "it's over, shocking, game changer, it's so over, we're so back" announcements that turn out to be just gpt-wrappers or resume-builder projects.
Very few papers that actually say something meaningful are left unnoticed, but as soon as you say something generic like "language models can do this", it gets featured in "AI influencer" posts.
I've tried to get to "garage", but failed at a few attempts, ChatGPT's ideas also seemed reasonable, but failed. Any takers? :)
"car + house + door" worked for me (interestingly "car + home + door" did not)
Thanks, nice :) House sounds more general, I guess.
I've had some fun finding this:
goshawk-cocaine = gyrfalcon , which is funny if you know anything about goshawks and gyrfalcons
(Goshawks are very intense, gyrs tend to be leisurely in flight.)
cool but not enough data to be useful yet I guess. Most of mine either didn't have the words or were a few % off the answer, vehicle - road + ocean gave me hydrosphere, but the other options below were boat, ship, etc. Klimt almost made it from Mozart - music + painting. doctor - hospital + school = teacher, nailed it.
Getting to cornbread elegantly has been challenging.
shows how bad embeddings are in a practical way
Huh, that's strange, I wanted to check whether your embeddings have biases, but I cannot use "white" word at all. So I cannot get answer to "man - white + black = ?".
But if I assume the biased answer and rearrange the operands, I get "man - criminal + black = white". Which clearly shows, how biased your embeddings are!
Funny thing, fixing biases and ways to circumvent the fixes (while keeping good UX) might be much challenging task :)
I'm getting Navralitova instead of queen. And can't get other words to work, I get red circles or no answer at all.
From another comment, https://news.ycombinator.com/item?id=43988861 King (with capital K) was a top 1 male tenis player.
dog - cat = paleolith
paleolith + cat = Paleolithic Age
paleolith + dog = Paleolithic Age
paleolith - cat = neolith
paleolith - dog = hand ax
cat - dog = meow
Wonder if some of the math is off or I am not using this properly
I figure the mathematically highest value must defer from the semantically most accurate relatively frequently. (Because Car - Wheel = Touring Car doesn't make a lot of sense to me.)
man - intelligence = woman (36%)
woman + intelligence = man (77%)
Oof.
It's interesting that I find loops. For example
car + stupid = idiot, car + idiot = stupid
Really?!
I probably should have prefaced this with "try at your own risk, results don't reflect the author's opinions"
I'm sure it would be trivial to get it to say something incredibly racist, so that's probably a worthwhile disclaimer to put on the website
I think subtraction is broken. None of what I tried made any sense. Water - oxygen = gin and tonic.
Telling that Jewess, feminist, and spinster were near matches as well.
woman+penis=newswoman (businesswoman is second)
man+vagina=woman (ok that is boring)
Man - brain = Irish sea
Case matters, obviously! Try "man" with a lower-case "M"!
3 replies →
What does it mean when it surrounds a word in red? Is this signalling an error?
Try Lower casing, my phone tried to capitalize and it was a problem.
Seems to be a word not in its dictionary. Seems to not have any country or language names.
Edit: these must be capitalized to be recognized.
Yes, word in red = word not found mostly the case when you try plurals or non-nouns (for now)
This is neat!
I think you need to disable auto-capitalisation because on mobile the first word becomes uppercase and triggers a validation error.
"man-intelligence=woman" is a particularly interesting result.
wine - alcohol = grape juice (32%)
Accurate.
Oh you have all the damn words. Even the Ricky Gervais ones.
mathematics - Santa Claus = applied mathematics
hacker - code = professional golf
for founders :
love + time = commitment
boredom + curiosity = exploration
vision + execution = innovation
resilience - fear = courage
ambition + humility = leadership
failure + reflection = learning
knowledge + application = wisdom
feedback + openness = improvement
experience - ego = mastery
idea + validation = product-market fit
uncle + aunt = great-uncle (91%)
great idea, but I find the results unamusing
Your aunt's uncle is your great-uncle. It's more correct than your intuition.
I asked ChatGPT (after posting my comment) and this is the response. "Uncle + Aunt = Great-Uncle is incorrect. A great-uncle is the brother of your grandparent."
I tried:
-red
and:
red-red-red
But it did not work and did not get any response. Maybe I am stupid but should this not work?
fluid + liquid = solid (85%) -- didn't expect that
blue + red = yellow (87%) -- rgb, neat
black + {red,blue,yellow,green} = white 83% -- weird
> blue + red = yellow (87%) -- rgb, neat
Blue + red is magenta. Yellow would be red + green.
None of these results make much sense to me.
king - man + woman = queen
queen - woman + man = drone
The second makes sense, I think, if you are a bee.
So, are you a bee keeper then?
Car - Wheel(s) doesn't really have results I'd guess at (boat, sled, etc.). Just specific four wheeled vehicles.
doesn’t do anything on my iphone
London-England+France=Maupassant
King-man+woman=Navratilova, who is apparently a Czech tennis player. Apparently, it's very case-sensitive. Cool idea!
"King" (capital) probably was interpreted as https://en.wikipedia.org/wiki/Billie_Jean_King , that's why a tennis player showed up.
when I first tried it, king was referring to the instrument and I was getting a result king-man+woman=flute ... :-D
Heh. This is fun:
Navratilova - woman + man = Lendl
Just use a LLM api to generate results, it will be far better and more accurate than a weird home cooked algorithm
man - courage = husband
Woman + president = man
male + age = female
female + age = male
Just inverting the canonical example fails: queen - woman + man = drone
This kind of makes sense for bees.
doctor - man + woman = medical practitioner
Good to understand this bias before blindly applying these models (Yes- doctor is gender neutral - even women can be doctors!!)
Fwiw, doctor - woman + man = medical practitioner too
rice + fish = fish meat
rice + fish + raw = meat
hahaha... I JUST WANT SUSHI!
it doesn't know the word human
twelve-ten+five=
six (84%)
Close enough I suppose
potato + microwave = potato tree
man + woman = adult female body
three + two = four (90%)
Haha, yes, this was my first thought too. It seems it’s quite bad at actual math!
dog - fur = Aegean civilization (22%)
huh
horse+man
78% male horse 72% horseman
noodle+tomato=pasta
this is pretty fun
Surely the correct answer would be `pasta-in-tomato-sauce`? Pasta exists outside of tomato sauce.
dog+woman = man
That's weird.
Now I'm wondering if this could be helpful in doing the NY Times Connections puzzle.
The app produces nonsense ... such as quantum - superposition = quantum theory !!!
garden + sin = gardening
hmm...
colorless+green+ideas doesn't produce anything of interest, which is disappointing.
well green is not a creative color, so that's to be expected
carbon + oxygen = nitrogen
LOL
[dead]
[flagged]
Can someone explain me what the fuck this is supposed to be!?
Semantical subtraction within embeddings representation of text ("meaning")
cheeseburger-giraffe+space-kidney-monkey = cheesecake