Comment by n2d4
2 days ago
You are being unnecessarily cynical. These are all subjective. I thought "datum" and "datasets" was quite clever, and while I would've chosen "man" for "king - crown" myself, I actually find "ruler" a better solution after seeing it. But each to their own.
The rant about network architecture misses my point, which is that an LLM does not just do a linear transformation and a similarity search. Sure, in the most abstract sense it still just computes an output embedding from two input embeddings, but only in a very distant, pedantic way. (Actually, to be VERY pedantic, that would not even be true, because ChatGPT's tokenizer embeds tokens, not words. The in- and output of the model is more than just the semantic embedding of words; using two different but semantically equivalent words may result in different outputs with a transformer LLM, but not in a word semantics model.)
I just thought it was cool that ChatGPT is so good at it.
I'm an engineer and researcher, it is my job to find problems, so that they can be resolved. I'd say this is different from being cynical as that tends to be dismissive. I understand how my comment can come off that way, though it wasn't my intention, so I'm clarifying.
You're right that there's subjectivity but not infinitely so. There is a bound to this and that's both required for language to work and for us to build these models. I did agree that the data one was tricky so not really going to argue, I was just pointing out a critical detail given that the models learn through pattern matching rather than a dictionary. It's why I made the comment about humans. As for ruler minus crown, I gave my explication, would you care to share yours? I'd like to understand your point of view so I can better my interpretation of the results, because frankly I don't understand. What is the semantic relationship being changed if not the attribute of ruler?
The architecture part was a miscommunication. I hope you understand how I misunderstood you when you said "this doesn't do embedding math like OP!". It is clear I'm not alone either.
To be pedantic, people generally refer to the tokenization and embedding simply as embedding. It's the common verbiage. This is because with BPE you are performing these steps simultaneously and the term is appropriate given the longer usage in math.
I was just trying to help you understand a different viewpoint.