Comment by 0xbadcafebee

15 days ago

Can LLMs actually parse human languages? Or can they react to stimuli with a trained behavioral response? Dogs can learn to sit when you say "sit", and learn to roll over when you say "roll over". But the dog doesn't parse human language; it reacts to stimuli with a trained behavioral response.

(I'm not that familiar with LLM/ML, but it seems like trained behavioral response rather than intelligent parsing. I believe this is part of why it hallucinates? It doesn't understand concepts, it just spits out words - perhaps a parrot is a better metaphor?)

Dogs out of all creatures probably actually do have some parsing going on for human language. They learn it like we do, picking up context from the environment, actions, tone, facial expressions, body language, etc.

You can say 'what's that' in many different ways and a clever dog will react differently for each, even if it's the first time it's heard you say 'what's that?' In a scared tone it'll still react differently while knowing what you're asking.

They even do the cute head tilt when they're struggling to understand something.

I think people vastly underestimate the power of wetware and think animals and us are separated by a chasm, but I think it's a relatively small leap.

We base so much of our understanding of other creatures intelligence on their ability to communicate with us or express things in the ways we do. If elephants judged humans on their ability to communicate in infrasound to speak their names (yes they have names for each other) they'd wouldn't think too highly of us.

Sidenote but the latest I've heard is that elephants like us because they think we are cute.

  • Oh sure, dogs definitely learn, they have about a 2 year old's level of intellect if I remember correctly? I just meant as example to the train/response thing. I believe the issue is their brains just aren't going to continue "leveling up" the way a human's does as it grows. My assumption is that this is because the AI isn't actually "understanding and thinking", but just acting according to its training. A program following source code, rather than rewriting its own source code, as it were.

LLMs are modelling the world, not just "predicting the next token". They are certainly not akin to parrots. Some examples here[1][2][3]. Anyone claiming otherwise at this point is not arguing in good faith.

[1] https://arxiv.org/abs/2405.15943

[2] https://x.com/OwainEvans_UK/status/1894436637054214509

[3] https://www.anthropic.com/research/tracing-thoughts-language...

  • Maybe it takes some world modeling to do it as well as they do, but ultimately they are just predicting the next token. These things are not mutually exclusive.

    • The issue is whether they are "just" predicting the next token. When people say they are stochastic parrots, they are denying any of these further capabilities. Modelling is facet of understanding and so to discover that LLMs model the world should strongly raise your credence that they do understand.

It can have a sensible conversation with you, follow your instructions, do math and physics, and write code that performs the task you described in English. Some models can create pictures and videos matching the description you gave them, or write descriptions of a video from you.

In 2023, Microsoft released a paper saying GPT4 could do things like tell you how to stack a random collection of unrelated variously-shaped objects so they don't fall over. Things have come a long way since then.

Try out one of the advanced models, and see whether you think it understands concepts.

Animals definitely parse human language, some to a significant extent.

Like an airplane taking off, things that seem like “emergent behavior” and hard lines of human vs animal behavior are really matters of degree that, like the airplane, we don’t notice until it actually takes flight… then we think there is a clean line between flying and not flying, but there isn’t. The airplane is gradually becoming weightless until it breaks contact with the ground, and animals use and understand language, but we only notice when it seems human.

  • There actually is a clean line between flying and not flying. And that's when the lift generated is greater than the pull of earth's gravity. The fact that it "feels" weightless gradually doesn't change the fact that if lift<weight then the plane is not flying. If lift>weight, plane is flying. There is no "semi flying". If it's already airborne and lift becomes less than weight, then it stops flying and starts gliding.

    The lift is an emergent behavior of molecules interacting (mostly) with the wings. But there is a hard clean cutoff between "flying" and "not flying".

    • Of course, but the cutoff I one of perception more than physics. The airplane is “not flying” right up until the lift generated is infinitesimally more than the weight of the aircraft. Likewise, during “flight” there are times when the lift is less than the weight, during descent. So the line seems clear but it is a matter of degree. The aircraft is not doing anything fundamentally different during the takeoff roll than during flight, it is all a matter of degree. There is no magical Change in physics or process.

      1 reply →

I'm not sure if it's even up for debate that they parse human language, in fact they do it better than most people do. Parsing is just breaking up language into it's component ideas and fitting those ideas to one another to achieve meaning. We can meaningfully argue whether they are capable of novel reasoning (probably not) or whether they can apply knowledge from one domain to another (sometimes!) but parsing seems undeniable.

>Can LLMs actually parse human languages?

IMHO, no, they have nothing approaching understanding. It's Chinese Rooms[1] all the way down, just with lots of bell and whistles. Spicy autocomplete.

1. https://en.wikipedia.org/wiki/Chinese_room

  • Actually, the LLMs made me realize John Searle’s “Chinese room” doesnt make much sense

    Because languages have many similar concepts so the operator inside the Chinese room can understand nearly all the concepts without speaking Chinese.

    And the LLM can translate to and from any language trivially, the inner layers do the actual understanding of concepts.

  • Go ask the operator of a Chinese room to do some math they weren't taught in school, and see if the translation guide helps.

    The analogy I've used before is a bright first-grader named Johnny. Johnny stumbles across a high school algebra book. Unless Johnny's last name is von Neumann, he isn't going to get anything out of that book. An LLM will.

    So much for the Chinese Room.

    • > Go ask the operator of a Chinese room to do some math they weren't taught in school, and see if the translation guide helps.

      That analogy only holds if LLMs can solve novel problems that can be proven to not exist in any form in their training material.

      12 replies →

    • A "Chinese Room" absolutely will, because the original thought experiment proposed no performance limits on the setup - the Room is said to pass the Turing Test flawlessly.

      People keep using "Chinese Room" to mean something it isn't and it's getting annoying. It is nothing more than a (flawed) intuition pump and should not be used as an analogy for anything, let alone LLMs. "It's a Chinese Room" is nonsensical unless there is literally an ACTUAL HUMAN in the setup somewhere - its argument, invalid as it is, is meaningless in its absence.

      3 replies →

    • Give Johnny a copier and a pair of scissors and he will be able to perform more or less the same; and likely get more out of it as well, since he has a clue what he is doing.

    • An LLM will get ... what exactly ? The ability to reorder its sentences ? The LLM doesn't think, doesn't understand, doesn't know what matters more than not, doesn't use what it learns, doesn't expand what it learns to new knowledge, doesn't enjoy reading that book and doesn't suffer through it.

      So what is it really gonna do with a book, that LLM ? Reorder its internal matrix to be a little bit more precise when autocompleting sentences sounding like the book ? We could build an nvidia cluster the size of the Sun and it would repeat sentences back to us in unbelievable ways but would still be unable to take a knowledge-based decision, I fear.

      So what are we in awe at exactly ? A pretty parrot.

      The day the Chinese room metaphor disappears is when ChatGPT replies to you that your question is so boring it doesn't want to expend the resources to think about it. But it'd be ready to talk about this or that, that it's currently trying to get better at. When it finally has agency over its own intelligence. When it acquires a purpose.

      7 replies →

Language and intelligence are mostly orthogonal to each other and development of linguistic skills appeared very late in human evolutionary terms.

Babies and in particular Deaf babies understand and communicate significant amount of information w/o parsing sentences. Dogs don't parse human speech, they associate an emotion to the particular sound and body language exhibited to them, repeatedly.

You can train LLMs on the output very complex CFGs, and it successfully learns the grammar and hierarchy needed to complete any novel prefix. This is a task much more recursive and difficult than human languages, so there's no reason to believe that LLMs aren't able to parse human languages in the formal sense as well.

And of course empirically LLMs do generate valid English sentences. They may not necessarily be _correct_ sentences in a propositional truth-value sense (as seen by so-called "hallucinations), but they are semantically "well-formed" in contrast to Chomsky's famous example of the failure of probabilistic grammar models, "Colorless green ideas sleep furiously."

I'm not a linguist but I don't think linguistics has ever cared about the truth value of a sentence, that's more under the realm of logic.

  • I disagree, I think it's clear in the article that Chomsky thinks a language also should have a human purpose.

    The compression we use in languages to not label impossible adjectives against impossible nouns (green ideas is impossible as ideas don't have colors, we could have a suffix on every noun to mark what can be colored and what cannot) is because we need to transfer these over the air, and quickly, before the lion jumps on the hunter. It's one of the many attributes of "languages in the wild" (Chinese doesn't use "tenses" really, can you imagine the compressive value?), and that's what Chomsky says here:

    Proceeding further with normal science, we find that the internal processes and elements of the language cannot be detected by inspection of observed phenomena. Often these elements do not even appear in speech (or writing), though their effects, often subtle, can be detected. That is yet another reason why restriction to observed phenomena, as in LLM approaches, sharply limits understanding of the internal processes that are the core objects of inquiry into the nature of language, its acquisition and use. But that is not relevant if concern for science and understanding have been abandoned in favor of other goals.

    Understand what he means: you can read a million text through a machine, it will never infer why we don't label adjective and nouns to prevent confusion and "green ideas". But for us it's painfully obvious, we don't have time when we speak to do all that. And I come from a language when we label every noun with a gender, I can see how stupid and painful it is to grasp for foreigners: it doesn't make any sense. Why do we do it ? Ask ChatGPT, will it tell you that it's because we like how beautiful it all sounds, which is the stupid reason why we do that ?

  • A “complex” cfg is still a cfg, and, giving credence to Chomsky’s hierarchy, remains computationally less complex than natural, context sensitive, grammars. Even a complex cfg can be parsed by a relatively simple program in ways that context-sensitive grammars cannot.

    My understanding is that context sensitive grammars _can_ allow for recursive structures that are beyond cfgs, which is precisely why they sit below csgs in terms of computational complexity.

    I don’t agree or disagree that LLMs might be, or are, capable of parsing (i.e., perception in Chomsky’s terms, or, arguably, “understanding” in any sense). But that they can learn the grammar of a “complex cfg” isn’t a convincing argument for the reasons you indicate.

    • I don't think it's clear that human languages are context sensitive. The only consistent claim I can find is that at one point someone examined Swiss German and found that it's weakly context sensitive. Also empirically human language don't have that much recursion. You can artificially construct such examples, but beyond a certain depth people won't be able to parse it either.

      I don't know whether the non-existence of papers studying whether LLMs can model context-sensitive grammar is because they can't, or because people haven't tested that hypothesis yet. But again empirically LLMs do seem to be able to reproduce human language just fine. The whole "hallucination" argument is precisely that LLMs are very good at reproducing the structure of language even if those statements don't encode things with the correct truth value. The fact that they successfully learn to parse complex CFGs is thus evidence that they can actually learn underlying generative mechanisms instead of simply parroting snippets of training data as naively assumed, and it's not a huge leap to imagine that they've learned some underlying "grammar" for English as well.

      So if one argues that LLMs as a generative model cannot generate novel valid sentences in the English language, then that is easily falsifiable hypothesis. If we had examples of LLMs producing non-well formed sentences, people would have latched onto that by now, instead of "count Rs in strawbery" but I've never seen anyone arguing as such.

      1 reply →

  • I’ve seen ChatGPT generate bad English and I’ve seen the layer or logic / UI re-render the page as I think there is a simple spell checker that kicks in and tells the api to re-render and recheck.

    I don’t believe for one second that LLMs reason, understand, know, anything.

    There are plenty of times LLMs fail to generate correct sentences, and plenty of times they fail to generate correct words.

    Around the time ChatGPT rolled out web search inside actions, you’d get really funky stuff back and watch other code clearly try to catch the run away.

    o3 can be hot garbage if you ask it expand a specific point inside a 3 paragraph memo, the reasoning models perform very, very poorly when they are not summarizing.

    There are times where the thing works like magic, other times, asking it to write me a PowerShell script that gets users by first and last name has it inventing commands that flags that don’t exist.

    If the model ‘understood’, ‘followed, some sort of structure outside parroting stuff it already knows about it would be easy to spot and guide it via prompts. That is not the case even with the most advanced models today.

    It’s clear that LLMs work best at specific small tasks that have a well established pattern defined in a strict language or api.

    I’ve broken o3 trying to have it lift working python code, into formal python code, how? The person that wrote the code didn’t exactly code it how a developer would code a program. 140 lines of basic grab some data generate a table broke the AI and it had the ‘informal’ solution in the prompt. So no there is zero chance LMMs do more than predict.

    And to be clear, it one shot a whole thing for me last night, using the GitHub/Codex/agent thing in VS code, probably saved me 30 minutes but god forbid you start from a bad / edge / poorly structured thing that doesn’t fit the mould.

The terms are too unclear here. Can you define what it means to "be able to parse human language"? I'm sure contemporary chatbots score higher on typical reading comprehension tests than most humans. You're certainly correct that llms "only" react to stimuli with a trained response, but I guess anything that isn't consciousness necessarily fits that description

  • Good point, thanks for calling that out. I'm honestly not sure myself! On further reflection, it's probably a matter of degrees?

    So for example, a soldier is trained, and then does what it is told. But the soldier also has a deep trough of contextual information and "decision weights" which can change its decisions, often in ways it wasn't trained for. Or perhaps to put it another way: it is capable of operating outside the parameters it was given, "if it feels like it", because the information the soldier processes at any given time may make it not follow its training.

    A dog may also disobey an order after being trained, but it has a much smaller range of information it works off of, and fewer things influence its decision-making process. (genetics being a big player in the decision-making process, since they were literally bred to do what we want/defend our interests)

    So perhaps a chat AI, a dog, and a soldier, are just degrees along the same spectrum. I remember reading something about how we can get AI to be about as intelligent as a 2-year-old, and that dogs are about that smart. If that's the case (and I don't know that it is; I also don't know if chat AI is actually capable of "disobeying", much less "learning" anything it isn't explicitly trained to learn), then the next question I'd have is, why isn't the AI able to act and think like a dog yet?

    If we put an AI in a robot dog body and told it to act like a dog, would it? Or would it only act the way that we tell it dogs act like? Could/would it have emergent dog-like traits and spawn new dog lineages? Because as far as I'm aware, that's not how AI works yet; so to me, that would mean it's not actually doing the things we're talking about above (re: dogs/soldiers)