Comment by mjburgess
2 years ago
The 'world model' question seems "not even understood" by those in the field who provide these answers to it -- and use terms like "concepts" (see the linked paper on sentiment where the NN has apparently discovered a sentiment "concept").
Consider the world to contain causal properties which bring about regularities in text, eg., Alice likes chocolate so Alice says, "I like chocolate". Alice's liking, ie., her capacity for preference, desire, taste, asethetic juddgement etc is the cause of "like".
Now these causal properties brings about significant regularities in text, so "like" occurring early in the paragraph comes to be extremely predictive of other text tokens occurring (eg., b-e-s-t, etc.)
No one in this debate doubts, whatsoever, that NNs contain "subnetworks" which divide the problem up into detecting these token correlations. This is trivially observable in CNNs where it is trivial to demonstrate subnetworks "activating" on, say, an eye-shape.
The issue is that when a competent language user judges someone's sentiment, or the implied sentiment the speaker of some text would have -- they are not using a model of how some subset of terms (like, etc.) comes to be predictive of others.
They're using the fact that the know the relevant causal properties (liking, preference, desire, etc.) and how these cause certain linguistic phrases. It is for this reason a competent language user can trivially detect irony ("of course I like going to the dentist!" -- here since we know how unlikely it is to desire this, we know this phrase is unlikely to express such a preference, etc.).
To say NNs, or any ML system, is sensitive to these mere correlations is not to say that these correlations are not formed by tracking the symptoms of real causes (eg., desire). Rather it is to say they do not track desire.
This seems obvious, since the mechanism to train them is just sensitive to patterns in tokens. These patterns are not their causes, and are not models of their causes. They're only predictive of them under highly constrained circumstances.
Astrological signs are predictive of birth dates, but they arent models of being born -- nor of time, or anything else.
No one here doubts whether NNs are sensitive to patterns in text caused by causal properties -- the issue is that they arent models of these properties; they are models of (some of) their effects as encoded in text.
To be fair, the term "world model" does not presume scientific understanding, factfullness or causality.
In an ideal AI model this would be the aim though.
Then it isnt a model of the world.
If the term, "effect model" were used there would be zero debate. Of course NNs model the effects of sentiment.
The debate is that AI hype artists don't merely claim to model effects in constrained domains.
>Astrological signs are predictive of birth dates, but they arent models of being born -- nor of time, or anything else.
Also eating ice cream and getting bitten by a shark do have some mutual predictive associations.
I think that the chess-GPT experiment can be interesting, not because the machine can predict every causal connection, but how many causal connections can it extract from the training data by itself. By putting a human in the loop, much more causal connections will be revealed but the human is lazy. Or expensive. Or expensive because he is lazy.
In addition correlation can be a hint for causation. If a human researches it further, then maybe it is a correlation and nothing substantial, but sometimes it may actually be a causative effect. So there is value in that.
About the overall sentiment, NN's world model is very different from a human world model indeed.
I'm curious as to what practical difference you think this distinction makes? (not being sarcastic, I just don't see it)
If you understand the cause of a regularity, you will predict it in all relevant circumstances. If you're just creating a model of its effects in one domain, you can only predict it in that domain --- with all other factors held constant.
This makes (merely) predictive models extremely fragile; as we often see.
One worry about this fragility is saftey: no one doubts that, say, city route planning from 1bn+ images is done via a "pixel-correlation (world) model" of pedestrian behaviour. The issue is that it isnt a model of pedestrian behaviour.
So it is only effective insofar as the effects of pedestrian behaviour, as captured in the images, in these environments, etc. remain constant.
If you understood pedestrians, ie., people, then you can imagine their behaviour in arbitrary environments.
Another way of putting it is: correlative models of effects arent sufficient for imagining novel circumstances. They encode only the effects of causes in those circumstances.
Whereas if you had a real world model, you can trivially simulate arbiatry circumstnaces.
There's a _lot_ of evidence that LLM's _do_ generalize, though.
2 replies →