← Back to context

Comment by krackers

14 days ago

You can train LLMs on the output very complex CFGs, and it successfully learns the grammar and hierarchy needed to complete any novel prefix. This is a task much more recursive and difficult than human languages, so there's no reason to believe that LLMs aren't able to parse human languages in the formal sense as well.

And of course empirically LLMs do generate valid English sentences. They may not necessarily be _correct_ sentences in a propositional truth-value sense (as seen by so-called "hallucinations), but they are semantically "well-formed" in contrast to Chomsky's famous example of the failure of probabilistic grammar models, "Colorless green ideas sleep furiously."

I'm not a linguist but I don't think linguistics has ever cared about the truth value of a sentence, that's more under the realm of logic.

I disagree, I think it's clear in the article that Chomsky thinks a language also should have a human purpose.

The compression we use in languages to not label impossible adjectives against impossible nouns (green ideas is impossible as ideas don't have colors, we could have a suffix on every noun to mark what can be colored and what cannot) is because we need to transfer these over the air, and quickly, before the lion jumps on the hunter. It's one of the many attributes of "languages in the wild" (Chinese doesn't use "tenses" really, can you imagine the compressive value?), and that's what Chomsky says here:

Proceeding further with normal science, we find that the internal processes and elements of the language cannot be detected by inspection of observed phenomena. Often these elements do not even appear in speech (or writing), though their effects, often subtle, can be detected. That is yet another reason why restriction to observed phenomena, as in LLM approaches, sharply limits understanding of the internal processes that are the core objects of inquiry into the nature of language, its acquisition and use. But that is not relevant if concern for science and understanding have been abandoned in favor of other goals.

Understand what he means: you can read a million text through a machine, it will never infer why we don't label adjective and nouns to prevent confusion and "green ideas". But for us it's painfully obvious, we don't have time when we speak to do all that. And I come from a language when we label every noun with a gender, I can see how stupid and painful it is to grasp for foreigners: it doesn't make any sense. Why do we do it ? Ask ChatGPT, will it tell you that it's because we like how beautiful it all sounds, which is the stupid reason why we do that ?

A “complex” cfg is still a cfg, and, giving credence to Chomsky’s hierarchy, remains computationally less complex than natural, context sensitive, grammars. Even a complex cfg can be parsed by a relatively simple program in ways that context-sensitive grammars cannot.

My understanding is that context sensitive grammars _can_ allow for recursive structures that are beyond cfgs, which is precisely why they sit below csgs in terms of computational complexity.

I don’t agree or disagree that LLMs might be, or are, capable of parsing (i.e., perception in Chomsky’s terms, or, arguably, “understanding” in any sense). But that they can learn the grammar of a “complex cfg” isn’t a convincing argument for the reasons you indicate.

  • I don't think it's clear that human languages are context sensitive. The only consistent claim I can find is that at one point someone examined Swiss German and found that it's weakly context sensitive. Also empirically human language don't have that much recursion. You can artificially construct such examples, but beyond a certain depth people won't be able to parse it either.

    I don't know whether the non-existence of papers studying whether LLMs can model context-sensitive grammar is because they can't, or because people haven't tested that hypothesis yet. But again empirically LLMs do seem to be able to reproduce human language just fine. The whole "hallucination" argument is precisely that LLMs are very good at reproducing the structure of language even if those statements don't encode things with the correct truth value. The fact that they successfully learn to parse complex CFGs is thus evidence that they can actually learn underlying generative mechanisms instead of simply parroting snippets of training data as naively assumed, and it's not a huge leap to imagine that they've learned some underlying "grammar" for English as well.

    So if one argues that LLMs as a generative model cannot generate novel valid sentences in the English language, then that is easily falsifiable hypothesis. If we had examples of LLMs producing non-well formed sentences, people would have latched onto that by now, instead of "count Rs in strawbery" but I've never seen anyone arguing as such.

    • It’s uncontroversial now that the class of string languages roughly corresponding to “human languages” is mildly context sensitive in a particular sense. This debate was hashed out in the 80s and 90s.

      I don’t think formal languages classes have much to tell us about the capabilities of LLMs in any case.

      >Also empirically human language don't have that much recursion. You can artificially construct such examples, but beyond a certain depth people won't be able to parse it either.

      If you limit recursion depth then everything is regular, so the Chomsky hierarchy is of little application.

I’ve seen ChatGPT generate bad English and I’ve seen the layer or logic / UI re-render the page as I think there is a simple spell checker that kicks in and tells the api to re-render and recheck.

I don’t believe for one second that LLMs reason, understand, know, anything.

There are plenty of times LLMs fail to generate correct sentences, and plenty of times they fail to generate correct words.

Around the time ChatGPT rolled out web search inside actions, you’d get really funky stuff back and watch other code clearly try to catch the run away.

o3 can be hot garbage if you ask it expand a specific point inside a 3 paragraph memo, the reasoning models perform very, very poorly when they are not summarizing.

There are times where the thing works like magic, other times, asking it to write me a PowerShell script that gets users by first and last name has it inventing commands that flags that don’t exist.

If the model ‘understood’, ‘followed, some sort of structure outside parroting stuff it already knows about it would be easy to spot and guide it via prompts. That is not the case even with the most advanced models today.

It’s clear that LLMs work best at specific small tasks that have a well established pattern defined in a strict language or api.

I’ve broken o3 trying to have it lift working python code, into formal python code, how? The person that wrote the code didn’t exactly code it how a developer would code a program. 140 lines of basic grab some data generate a table broke the AI and it had the ‘informal’ solution in the prompt. So no there is zero chance LMMs do more than predict.

And to be clear, it one shot a whole thing for me last night, using the GitHub/Codex/agent thing in VS code, probably saved me 30 minutes but god forbid you start from a bad / edge / poorly structured thing that doesn’t fit the mould.