Comment by zkmon

20 days ago

Ok, let's say "cash is king". What is queen?

Arithmetic is king = royalty + male, while queen = royalty + female

But then it makes all these words just arithmetic values without meaning. Even if the words "royalty" and "male" can be sum or difference of some other words and so on - all are just numbers, no meaning at all.

24 comments

zkmon

MITSardine 20 days ago

The representation might not need to explicitly encode "meaning", if it does so implicitly by preserving essential properties of how things relate to each other.

For instance, a CAD object has no notion of what an airplane wing or car wheel are, but it can represent those in a way that how a wing relates to a fuselage is captured in numerical simulations. This is because it doesn't mangle the geometry the user wanted to represent ("what it means", in a geometric sense), although it does make it differ in certain ways that are "meaningless" (e.g. spurious small features, errors under tolerances), much like this representation might do with words.

Back to words, how do you define meaning anyways? I believe I was taught what words "mean" by having objects pointed to as a word was repeated: "cat", says the the parent as they point to a cat, "bird", as they point to a bird. Isn't this also equality/correspondence by relative frequency?

viccis 20 days ago

I really think you should actually read the article. None of what you are saying has to do with the content of it, and it will explain how you can do arithmetic with these words.

quietbritishjim 20 days ago

From the guidelines https://news.ycombinator.com/newsguidelines.html
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".
Besides which, this is totally a valid question based on the article. (The temptation to ask if you read it is almost overwhelming!) It talks about how to do arithmetic but not what the result of that will necessarily be, so I don't see that any part of it answers the question of "cash is king" + "female" - "male".

kleene_op 20 days ago

The meaning is not in the numbers themselves, but in how they each relate to one another.

Also those are not mere numbers here, but vectors. Dimensionality and orthogonality is key to define complex relationships.

nkrisc 20 days ago

It doesn’t work because that’s just wrong. The semantic meaning of “king” is much more than simply “royalty” and “male”. And it will be different for different people based on their experiences and familiarity with English and world history as well.

Then there’s the phonetic angle in addition to the semantic one. Why isn’t cash emperor? Because “cash is king” is alliterative.

Then there’s the orthographic angle: it’s a lot easier to write “king” than “emperor”.

jen20 18 days ago

Not least wrong because the husband of a reigning Queen is not King in some places, but the wife of a reigning King is typically Queen.

d--b 20 days ago

This is the kind of question that made us doubt that AI would happen one day, cause meaning is quite remote from its expression.

There was always these contextual meanings that differ widely.

Like "carrots are orange" is a fact that's generally okay, but is not true at all, carrots come in a very wide range of colors.

But LLMs completely crushed through these problems. And vector embeddings are a bit part of why it worked.

So yeah, somewhere in those vectors is something that says that when cash is king, "king" has no relationship to monarchy.

Antibabelic 20 days ago

This is my problem with people claiming that LLMs "understand". What we usually call "meaning" in intricately related to an encyclopedic knowledge of the world around us. How does this kind of knowledge not get into the same kind of loop you've just described? It is ultimately founded on our direct experience of the world, on sense data. It is ultimately embodied knowledge.

viccis 20 days ago
Vector spaces and bag of words models are not specifically related to LLMs, so I think that's irrelevant to this topic. It's not about "knowledge", just the ability to represent words in such a way that similarities between them take on useful computational characteristics.
- Jorge1o1 20 days ago
  
  Well, pretty much all of the LLMs are based on the decode-only version of the Transformer architecture (in fact it’s the T in GPT).
  And in the Transformer architecture you’re working with embeddings, which are exactly what this article is about, the vector representation of words.
gostsamo 20 days ago
And what are we if not a bunch of interconnected atoms? Smash a person to paste and you will not have any deep meaning in them, no life, nor sublime substance making them different from the dust they were made of. What is special in humans? Aren't we just an especially complex hydrocarbon mass that receives external stimuli and remaps them to a physical output? What makes you think that there is something more inside?
- keybored 20 days ago
  
  There’s nothing special about having an embodiment. A robot has an embodiment of sorts. An LLM meanwhile is a brain in a vat.
  And there’s nothing special about my 21x23 square feet lawn. Can you emulate it? To what fidelity? How much should the map correspond to the territory? The same squarage, the same elevations down to the millimeter?
  You’re not saying anything that counters the point that was made. Just mentioning stuff that people animals are made of with the assumed strawman argument (not made) that there is any non-physical essence at play. There isn’t.
  Put a camera and some feet on an LLM and maybe it has an embodimeent.As long as it just has digital input it does not in the sense being discussed here.
- Antibabelic 20 days ago
  
  What I am talking about concerns how human language relates to meaning. I'm not sure what this has to do with humans being "special". Saying that humans are "just an especially complex hydrocarbon mass that receives external stimuli and remaps them to a physical output" misses the point that what data we have available to us is qualitatively different from that of today's best natural language generation software.
  
  1 reply →
gf000 20 days ago

I really recommend watching this section of this video. Embeddings do encode plenty of "human knowledge" into the vector values and their relations to each other.
https://youtu.be/wjZofJX0v4M?si=QEaPWcp3jHAgZSEe&t=802
This even opens up a more data-based approach to linguistics, where it is also heavily used.
TeMPOraL 20 days ago

s/embodied/embedded/, and this is how LLMs understand.
As others already mentioned, the secret is that arithmetic is done on vector in high-dimensional space. The meaning of concepts is in how they relate to each other, and high dimensional spaces end up being a surprisingly good representation.

philipallstar 20 days ago

If Johnny Cash is king, then Queen is queen.

imp0cat 20 days ago

    "cash is king". What is queen?

"expenses", obviously. ;)

mejutoco 20 days ago

I asked deepseek and after many options the recommended one was:

Cash flow, "because you need the ongoing stream, not just a pile of cash, to reign successfully".

DonHopkins 20 days ago

Cash is the Man in Black. Elvis is King.

fruitworks 20 days ago

If I had to guess, cash - king + queen = credit (or money or something?). You are just asking the same thing as cash - man + woman, or "What is the feminine version of cash?" because queen - king ~= woman - man.

I say credit, because it is not as physical and direct as cash, so perhaps it is perceptually more feminine?

But I will have to check the next time I work with word2vec.

HPsquared 20 days ago

Can you embed a phrase like that in the same way?

hahahahhaah 20 days ago

queen is going to be a funny vector, mostly royalty, slightly chess, a bit gay, a bit rock and roll, a bit bee. Finally:

Queen + One = King

wongarsu 20 days ago

cash - king + queen = cashing, cash - male + female = cashing (in qwen3-embedding:0.6b). Make of that what you will