Comment by foobarqux
15 days ago
I don't really understand your question but if a deep neural network predicts the weather we don't have any problem accepting that the deep neural network is not an explanatory model of the weather (the weather is not a neural net). The same is true of predicting language tokens.
Apologies, I don't know enough to articulate my question, which is probably nonsensical any way.
LLMs (like GPT) and grammars (like Backus–Naur Form) are two different kinds of generative (production) systems, right?
You've been (heroically) explaining Chomsky's criticism of LLMs to other noobs: grammars (theoretically) explain how humans do language, which is very different from how ChatGPT (stochastic parrots) do language. Right?
Since GPT mimics human language so convincingly, I've been wondering if there's any overlap of these two generative systems.
Especially once the (tokenized) training data for GPTs is word based instead of just snippets of characters.
Because I notice grammars everywhere and GPT is still magic to me. Maybe I'd benefit if I could understand GPTs in terms of grammars.
> Since GPT mimics human language so convincingly, I've been wondering if there's any overlap of these two generative systems.
It's not really relevant if there is overlap, I'm sure you can list a bunch of ways they are similar. What's important is 1. if they are different in fundamental ways and 2. whether LLMs explain anything about the human language faculty.
For 1. the most important difference is that human languages appear to have certain constraints (roughly that language has parse tree/hierarchical structure) and (from the experiments of Moro) humans seem to not be able to learn arguably simpler structures that are not hierarchical. LLMs on the other hand can be trained on those simpler structures. That shows that the acquisition process is not the same, which is not surprising since neural networks work on arbitrary statistical data and don't have strong inductive biases.
For 2. even if it turned out that LLMs couldn't learn the same languages it doesn't explain anything. For example you could hard-code the training to fail if it detects an "impossible language" then what? You've managed to create an accurate predictor but you don't have any understanding of how or why it works. This is easier to understand with non-cognitive systems like the weather or gravity: If you create a deep neural network that accurately predicts gravity it is not the same as coming up with the general theory of relativity (which could in fact be a worse predictor for example at quantum scales). Everyone argues the ridiculous point that since LLMs are good predictors then gaining understanding about the human language faculty is useless, which is a stance that wouldn't be accepted for the study of gravity or in any other field.
> is not an explanatory model of the weather (the weather is not a neural net)
I don't follow. Aren't those entirely separate things? The most accurate models of anything necessarily account for the underlying mechanisms. Perhaps I don't understand what you mean by "explanatory"?
Specifically in the case of deep neural networks, we would generally suppose that it had learned to model the underlying reality. In effect it is learning the rules of a sufficiently accurate simulation.
> The most accurate models of anything necessarily account for the underlying mechanisms
But they don't necessarily convey understanding to humans. Prediction is not explanation.
There is a difference between Einstein's General Theory of Relativity and a deep neural network that predicts gravity. The latter is virtually useless for understanding gravity (that's even if makes better predictions).
> Specifically in the case of deep neural networks, we would generally suppose that it had learned to model the underlying reality. In effect it is learning the rules of a sufficiently accurate simulation.
No, they just fit surface statistics, not underlying reality. Many physics phenomena were predicted using theories before they were observed, they would not be in the training data even though they were part of the underlying reality.
> No, they just fit surface statistics, not underlying reality.
I would dispute this claim. I would argue that as models become more accurate they necessarily more closely resemble the underlying phenomena which they seek to model. In other words, I would claim that as a model more closely matches those "surface statistics" it necessarily more closely resembles the underlying mechanisms that gave rise to them. I will admit that's just my intuition though - I don't have any means of rigorously proving such a claim.
I have yet to see an example where a more accurate model was conceptually simpler than the simplest known model at some lower level of accuracy. From an information theoretic angle I think it's similar to compression (something that ML also happens to be almost unbelievably good at). Related to this, I've seen it argued somewhere (I don't immediately recall where though) that learning (in both the ML and human sense) amounts to constructing a world model via compression and that rings true to me.
> Many physics phenomena were predicted using theories before they were observed
Sure, but what leads to those theories? They are invariably the result of attempting to more accurately model the things which we can observe. During the process of refining our existing models we predict new things that we've never seen and those predictions are then used to test the validity of the newly proposed models.
2 replies →