← Back to context

Comment by calibas

14 days ago

The fact that we have figured out how to translate language into something a computer can "understand" should thrill linguists. Taking a word (token) and abstracting it's "meaning" as a 1,000-dimension vector seems like something that should revolutionize the field of linguistics. A whole new tool for analyzing and understanding the underlying patterns of all language!

And there's a fact here that's very hard to dispute, this method works. I can give a computer instructions and it "understands" them in a way that wasn't possible before LLMs. The main debate now is over the semantics of words like "understanding" and whether or not an LLM is conscious in the same way as a human being (it isn't).

Restricted to linguistics, LLM's supposed lack of understanding should be a non-sequitur. If the question is whether LLMs have formed a coherent ability to parse human languages, the answer is obviously yes. In fact not just human languages, as seen with multimodality the same transformer architecture seems to work well to model and generate anything with inherent structure.

I'm surprised that he doesn't mention "universal grammar" once in that essay. Maybe it so happens that humans do have some innate "universal grammar" wired in by instinct but it's clearly not _necessary_ to be able to parse things. You don't need to set up some explicit language rules or generative structure, enough data and the model learns to produce it. I wonder if anyone has gone back and tried to see if you can extract out some explicit generative rules from the learned representation though.

Since the "universal grammar" hypothesis isn't really falsifiable, at best you can hope for some generalized equivalent that's isomorphic to the platonic representation hypothesis and claim that all human language is aligned in some given latent representation, and that our brains have been optimized to be able to work in this subspace. That's at least a testable assumption, by trying to reverse engineer the geometry of the space LLMs have learned.

  • Can LLMs actually parse human languages? Or can they react to stimuli with a trained behavioral response? Dogs can learn to sit when you say "sit", and learn to roll over when you say "roll over". But the dog doesn't parse human language; it reacts to stimuli with a trained behavioral response.

    (I'm not that familiar with LLM/ML, but it seems like trained behavioral response rather than intelligent parsing. I believe this is part of why it hallucinates? It doesn't understand concepts, it just spits out words - perhaps a parrot is a better metaphor?)

    • Dogs out of all creatures probably actually do have some parsing going on for human language. They learn it like we do, picking up context from the environment, actions, tone, facial expressions, body language, etc.

      You can say 'what's that' in many different ways and a clever dog will react differently for each, even if it's the first time it's heard you say 'what's that?' In a scared tone it'll still react differently while knowing what you're asking.

      They even do the cute head tilt when they're struggling to understand something.

      I think people vastly underestimate the power of wetware and think animals and us are separated by a chasm, but I think it's a relatively small leap.

      We base so much of our understanding of other creatures intelligence on their ability to communicate with us or express things in the ways we do. If elephants judged humans on their ability to communicate in infrasound to speak their names (yes they have names for each other) they'd wouldn't think too highly of us.

      Sidenote but the latest I've heard is that elephants like us because they think we are cute.

      3 replies →

    • It can have a sensible conversation with you, follow your instructions, do math and physics, and write code that performs the task you described in English. Some models can create pictures and videos matching the description you gave them, or write descriptions of a video from you.

      In 2023, Microsoft released a paper saying GPT4 could do things like tell you how to stack a random collection of unrelated variously-shaped objects so they don't fall over. Things have come a long way since then.

      Try out one of the advanced models, and see whether you think it understands concepts.

    • Animals definitely parse human language, some to a significant extent.

      Like an airplane taking off, things that seem like “emergent behavior” and hard lines of human vs animal behavior are really matters of degree that, like the airplane, we don’t notice until it actually takes flight… then we think there is a clean line between flying and not flying, but there isn’t. The airplane is gradually becoming weightless until it breaks contact with the ground, and animals use and understand language, but we only notice when it seems human.

      3 replies →

    • I'm not sure if it's even up for debate that they parse human language, in fact they do it better than most people do. Parsing is just breaking up language into it's component ideas and fitting those ideas to one another to achieve meaning. We can meaningfully argue whether they are capable of novel reasoning (probably not) or whether they can apply knowledge from one domain to another (sometimes!) but parsing seems undeniable.

    • Language and intelligence are mostly orthogonal to each other and development of linguistic skills appeared very late in human evolutionary terms.

      Babies and in particular Deaf babies understand and communicate significant amount of information w/o parsing sentences. Dogs don't parse human speech, they associate an emotion to the particular sound and body language exhibited to them, repeatedly.

    • You can train LLMs on the output very complex CFGs, and it successfully learns the grammar and hierarchy needed to complete any novel prefix. This is a task much more recursive and difficult than human languages, so there's no reason to believe that LLMs aren't able to parse human languages in the formal sense as well.

      And of course empirically LLMs do generate valid English sentences. They may not necessarily be _correct_ sentences in a propositional truth-value sense (as seen by so-called "hallucinations), but they are semantically "well-formed" in contrast to Chomsky's famous example of the failure of probabilistic grammar models, "Colorless green ideas sleep furiously."

      I'm not a linguist but I don't think linguistics has ever cared about the truth value of a sentence, that's more under the realm of logic.

      5 replies →

    • The terms are too unclear here. Can you define what it means to "be able to parse human language"? I'm sure contemporary chatbots score higher on typical reading comprehension tests than most humans. You're certainly correct that llms "only" react to stimuli with a trained response, but I guess anything that isn't consciousness necessarily fits that description

      1 reply →

  • > If the question is whether LLMs have formed a coherent ability to parse human languages, the answer is obviously yes.

    No, not "obviously". They work well for languages like English or Chinese, where word order determines grammar.

    They work less well where context is more important. (e.g. Grammatical gender consistency.)

  • Alternatively, what Chomsky was thinking about with his universal grammar idea is something implicitly present in both our minds and an LLM i.e. "it's the wiring, stupid".

    I'm not sure there's much evidence for this one way or another at this point.

    • >is something implicitly present in both our minds and an LLM

      The LLM doesn't start with any real structure besides the network of ops though. If there is any induced structure, it's learnable from the data. And given enough data the base network is sufficient to learn the "grammar" of not just human language but more complex CFGs and things you wouldn't traditionally consider "languages" as well (e.g. audio, images). In a sort of chicken/egg scenario, the morasses of data gives rise to the structures needed to parse and generate that data.

  • by "parse" I usually assume I get out some sort of AST I can walk and manipulate. LLMs do no such thing. There is no parsing going on.

Unfortunately you've undermined your point by making sweeping claims about something that is the literal hardest known problem in philosophy (consciousness).

I'm not actually comfortable saying that LLMs aren't conscious. I think there's a decent chance they could be in a very alien way.

I realize that this is a very weird and potentially scary claim for people to parse but you must understand how weird and scary consciousness is.

  • Note that I didn't say they aren't conscious, I said they aren't conscious "in the same way as a human being". I left open the possibility they could be conscious "in a very alien way".

> whether or not an LLM is conscious in the same way as a human being

The problem is... that there is a whole amount of "smart" activities humans do without being conscious of it.

- Walking, riding a bike, or typing on a keyboard happen fluidly without conscious planning of each muscle movement.

- You can finish someone sentence or detect if a sentence is grammatically wrong, often without being able to explain the rule.

- When you enter a room, your brain rapidly identifies faces, furniture, and objects without you consciously thinking, “That is a table,” or “That is John.”

  • Indeed, the "rider and elephant" issue.

    During Covid I gave a lecture on Python on Zoom in a non-English language. It was a beginner's topic about dictionary methods. I was attempting to multi-task and had other unrelated tasks open on second computer.

    Midway through the lecture I noticed to my horror that I had switched to English without the audience noticing.

    Going back through the recording I noticed the switch was fluid and my delivery was reasonable. What I talked about was just as good as something presented by LLM these days.

    So this brings up the question - why aren't we p-zombies all the time instead of 99% of time?

    Are there any tasks that absolutely demand human consciousness as we know it?

    Presumably long term planning is something that active human consciousness is needed.

    Perhaps there is some need for consciousness when one is in "conscious mastery" phase of acquiring a skill.

    This goes for any skill such as riding a bicycle/playing chess/programming at a high level.

    Once one reaches "unconscious mastery" stage the rider can concentrate on higher meta game.

Why would that thrill linguists? I'm not saying it hasn't/wouldn't/shouldn't, but I don't see why this technology would have the dramatic impact you imagine.

Is/was the same true for ASCII/Smalltalk/binary? They are all another way to translate language into something the computer "understands".

Perhaps the fact that it hasn't would lead some to question the validity of their claims. When a scientist makes a claim about how something works, it's expected that they prove it.

If the technology is as you say, show us.

  • > Is/was the same true for ASCII/Smalltalk/binary? They are all another way to translate language into something the computer "understands".

    That's converting characters into a digital representation. "A" is represented as 01000001. The tokenization process for an LLM is similar, but it's only the first step.

    An LLM isn't just mapping a word to a number, you're taking the entire sentence, considering the position of the words and converting it into vectors within a 1,000+ dimensional space. Machine learning has encoded some "meaning" within these dimensions that goes far far beyond something like an ASCII string.

    And the proof here is that the method actual works, that's why we have LLMs.

Word embeddings (that 1000-dimension vector you mention) are not new. No comment on the rest of your comment, but that aspect of LLMs is "old" tech - word2vec was published 11 years ago.

  > The fact that we have figured out how to translate language into something a computer can "understand" should thrill linguists. 

I think they are really excited by this. There seems no deficiency of linguists using these machines.

But I think it is important to distinguish the ability to understand language and translate it. Enough that you yourself put quotes around "understanding". This can often be a challenge for many translators, not knowing how to properly translate something because of underlying context.

Our communication runs far deeper than the words we speak or write on a page. This is much of what linguistics is about, this depth. (Or at least that's what they've told me, since I'm not a linguist) This seems to be the distinction Chomsky is trying to make.

  > The main debate now is over the semantics of words like "understanding" and whether or not an LLM is conscious in the same way as a human being (it isn't).

Exactly. Here, I'm on the side of Chomsky and I don't think there's much of a debate to be had. We have a long history of being able to make accurate predictions while erroneously understanding the underlying causal nature.

My background is physics, and I moved into CS (degrees in both), working on ML. I see my peers at the top like Hinton[0] and Sutskever[1] making absurd claims. I call them absurd, because it is a mistake we've made over and over in the field of physics[2,3]. One of those lessons you learn again and again, because it is so easy to make the mistake. Hinton and Sutskever say that this is a feature, not a bug. Yet we know it is not enough to fit the data. Fitting the data allows you to make accurate, testable predictions. But it is not enough to model the underlying causal structure. Science has a long history demonstrating accurate predictions with incorrect models. Not just in the way of the Relativity of Wrong[4], but more directly. Did we forget that the Geocentric Model could still be used to make good predictions? Copernicus did not just face resistance from religious authorities, but also academics. The same is true for Galileo, Boltzmann, Einstein and many more. People didn't reject their claims because they were unreasonable. They rejected the claims because there were good reasons to. Just... not enough to make them right.

[0] https://www.reddit.com/r/singularity/comments/1dhlvzh/geoffr...

[1] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=449s

[2] https://www.youtube.com/watch?v=hV41QEKiMlM

[3] Think about what Fermi said in order to understand the relevance of this link: https://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness...

[4] https://hermiene.net/essays-trans/relativity_of_wrong.html

"The fact that we have figured out how to translate language into something a computer can "understand" should thrill linguists."

No, there is no understanding at all. Please don't confuse codifying with understanding or translation. LLMs don't understand their input, they simply act on it based on the way they are trained on it.

"And there's a fact here that's very hard to dispute, this method works. I can give a computer instructions and it "understands" them "

No, it really does not understand those instructions. It is at best what used to be called an "idiot savant". Mind you, people used to describe others like that - who is the idiot?

Ask your favoured LLM to write a programme in a less used language - ooh let's try VMware's PowerCLI (it's PowerShell so quite popular) and get it to do something useful. It wont because it can't but it will still spit out something. PowerCLI is not extant across Stackoverflow and co much but it is PS based so the LLMs will hallucinate madder than a hippie on a new super weed.

  • I think the overarching theme that I glean from LLM critics is some kind of visceral emotional reaction, disgust even, with the idea of them, leading to all these proxy arguments and side quests in order to try and denigrate the idea of them without actually honestly engaging with what they are or why people are interacting with them.

    so what they don't "understand", by your very specific definition of the word "understanding"? the person you're replying to is talking about the fact that they can say something to their computer in the form of casual human language and it will produce a useful response, where previously that was not true. whether that fits your suspiciously specific definition of "understanding" does not matter a bit.

    so what they are over-confident with areas outside of their training data? provide more training data, improve the models, reduce the hallucination. it isn't an issue with the concept, it's an issue with the execution. yes you'll never be able to reduce it to 0%, but so what? humans hallucinate too. what are we aiming for? omniscience?