Comment by foobarqux

6 months ago

> these inductive biases are aligned with human language in important ways.

They aren’t, which is the entire point of this conversation, and simply asserting otherwise isn’t an argument.

> It seems that GPT language models do favor real language over the perturbed ones, and this shows that they have a simplicity bias which aligns with human language. This is remarkable, considering that the GPT architecture doesn't look like what one would expect based on existing linguistic theory.

This is a non-sensical argument: consider if you had studied a made up language that required you to factor numbers or do something else inherently computationally expensive. LLMs would favor simplicity bias “just like humans” but it’s obvious this doesn’t tell you anything and specifically doesn’t tell you that LLMs are like humans in any useful sense.

> There's no universal a priori sense in which Moro's linear counting languages are "simple" but our deterministically shuffled languages aren't.

You are missing the point, which is that humans cannot as easily learn Moro languages while LLMs can. Therefore LLMs are different in a fundamental way from humans. This difference is so fundamental that you need to give strong, specific, explicit justification why LLMs are useful in explaining humans. The only reason I used the word “simple” is to argue that LLMs would be able to learn it easily (without even having to run an experiment) but the same would be true if LLMs learned a non-simple language that humans couldn’t.

Again it doesn’t matter if you find all the ways that humans and LLMs are the same —- for example that they both struggle with shuffled sentences or with a language that involves factoring numbers —— what matters is that there exists a fundamental difference between them exemplified by the Moro languages.

> But it is still worthwhile to probe what those inductive biases are and to compare them with what humans do.

Why? There is no reason to believe you will learn anything from it. This is a bizarre abstract argument that doing something is useful because you might learn something from it. You can say that about anything you do. There is a video on YouTube where Chomsky engages with someone making similar arguments about chess computers. Chomsky said that there wasn’t any self evident reason why studying chess playing computers would tell you anything about humans. He was correct, we never did learn anything significant about humans from chess computers.

> As a comparison, context-free grammars turned out to be an imperfect model of syntax, but the field of syntax benefited a lot from exploring them and their limits.

There is a difference between pursuing a reasonable line of inquiry and having it fail versus pursuing one that you know or ought to know is flawed. If someone had pointed out the problems with CFG at the outset it would have been foolish to pursue it, just as it is foolish to ignore the Moro problem now.

> There are nontrivial challenges in scaling those languages up to the point that you can have a realistic training set

I can’t imagine what those challenges are, I don’t remember the details but I believe Moro made systematic simple grammar changes. Your Hop is in the same vein.

> where the control condition is a real language

Why does the control need to be a real language? Moro did not use a real language control on humans. (Edit: Because you want to use pre-trained models?).

> More generally our goal was to get formal linguists more interested in defining the impossible vs. possible language distinction more carefully

Again you’ve invented an abstract problem to study that has no bearing on the problem that Chomsky has described. Moro showed that humans struggle with certain synthetic grammar constructions. Chomsky noted that LLMs do not have this important feature. You are now trying to take this concrete observation about humans and turning it into the abstract field of the study of “impossible languages”.

> It's not as simple as hierarchical vs. linear

There are different aspects of language but there is a characteristic feature missing from LLMs which makes them unsuitable as models for human language. It doesn’t make any sense for a linguist to care about LLMs unless you provide justification for why they would learn anything about the human language faculty from LLMs despite that fundamental difference.

> I wouldn't read much into the magnitude of the difference between NoHop and Hop, because the Hop transformation only affects a small number of sentences, and the perplexity metric is an average over sentences

Even if this were true we return to “no evidence” rather than “evidence against”. But it is very unlikely that Moro-languages are any more difficult for LLMs to learn because, as I said earlier, they are very computationally simple, simpler than hierarchical languages.