Comment by krackers
15 days ago
I don't think it's clear that human languages are context sensitive. The only consistent claim I can find is that at one point someone examined Swiss German and found that it's weakly context sensitive. Also empirically human language don't have that much recursion. You can artificially construct such examples, but beyond a certain depth people won't be able to parse it either.
I don't know whether the non-existence of papers studying whether LLMs can model context-sensitive grammar is because they can't, or because people haven't tested that hypothesis yet. But again empirically LLMs do seem to be able to reproduce human language just fine. The whole "hallucination" argument is precisely that LLMs are very good at reproducing the structure of language even if those statements don't encode things with the correct truth value. The fact that they successfully learn to parse complex CFGs is thus evidence that they can actually learn underlying generative mechanisms instead of simply parroting snippets of training data as naively assumed, and it's not a huge leap to imagine that they've learned some underlying "grammar" for English as well.
So if one argues that LLMs as a generative model cannot generate novel valid sentences in the English language, then that is easily falsifiable hypothesis. If we had examples of LLMs producing non-well formed sentences, people would have latched onto that by now, instead of "count Rs in strawbery" but I've never seen anyone arguing as such.
It’s uncontroversial now that the class of string languages roughly corresponding to “human languages” is mildly context sensitive in a particular sense. This debate was hashed out in the 80s and 90s.
I don’t think formal languages classes have much to tell us about the capabilities of LLMs in any case.
>Also empirically human language don't have that much recursion. You can artificially construct such examples, but beyond a certain depth people won't be able to parse it either.
If you limit recursion depth then everything is regular, so the Chomsky hierarchy is of little application.