Comment by empath-nirvana

2 years ago

I'm curious as to what practical difference you think this distinction makes? (not being sarcastic, I just don't see it)

4 comments

empath-nirvana

mjburgess 2 years ago

If you understand the cause of a regularity, you will predict it in all relevant circumstances. If you're just creating a model of its effects in one domain, you can only predict it in that domain --- with all other factors held constant.

This makes (merely) predictive models extremely fragile; as we often see.

One worry about this fragility is saftey: no one doubts that, say, city route planning from 1bn+ images is done via a "pixel-correlation (world) model" of pedestrian behaviour. The issue is that it isnt a model of pedestrian behaviour.

So it is only effective insofar as the effects of pedestrian behaviour, as captured in the images, in these environments, etc. remain constant.

If you understood pedestrians, ie., people, then you can imagine their behaviour in arbitrary environments.

Another way of putting it is: correlative models of effects arent sufficient for imagining novel circumstances. They encode only the effects of causes in those circumstances.

Whereas if you had a real world model, you can trivially simulate arbiatry circumstnaces.

empath-nirvana 2 years ago
There's a _lot_ of evidence that LLM's _do_ generalize, though.
- lossolo 2 years ago
  
  There is also a lot of evidence lately that they do not generalize.
  https://arxiv.org/abs/2311.00871
  https://arxiv.org/abs/2309.13638
  https://arxiv.org/abs/2311.09247
  https://arxiv.org/abs/2305.18654
- mjburgess 2 years ago
  
  There's many notions of "prediction" and "generalisation" -- the relevant ones here, which apply to NNs, are extremely limited. That's the problem with all this deceptive language -- it invites people to think NNs predict in the sense of simulate, and generalise in the sense of "apply across different effect domains".
  NNs cannot apply a 'concept' across different 'effect' domains, because they have only one effect domain: the training data. They are just models of how the effect shows itself in that data.
  This is why they do not have world models: they are not generalising data by building an effect-neutral model of something; theyre just modelling its effects.
  Compare having a model of 3D vs. a model of shadows of a fixed number of 3D objects. NNs generalise in the sense that they can still predict for shadows similar to their training set. They cannot predict 3d; and with sufficiently novel objects, fail catastrophically.