← Back to context

Comment by natch

14 hours ago

He is one of these people who think that humans have a direct experience of reality not mediated by as Alan Kay put it three pounds of oatmeal. So he thinks a language model can not be a world model. Despite our own contact with reality being mediated through a myriad of filters and fun house mirror distortions. Our vision transposes left and right and delivers images to our nerves upside down, for gawd’s sake. He imagines none of that is the case and that if only he can build computers more like us then they will be in direct contact with the world and then he can (he thinks) make a model that is better at understanding the world

Isn't this idea demonstrably false due to the existence of various sensory disorders too?

I have a disorder characterised by the brain failing to filter own its own sensory noise, my vision is full of analogue TV-like distortion and other artefacts. Sometimes when it's bad I can see my brain constructing an image in real time rather than this perception happening instantaneously, particularly when I'm out walking. A deer becomes a bundle of sticks becomes a muddy pile of rocks (what it actually is) for example over the space of seconds. This to me is pretty strong evidence we do not experience reality directly, and instead construct our perceptions predictively from whatever is to hand.

  • The default philosophical position for human biology and psychology is known as Representational Realism. That is, reality as we know it is mediated by changes and transformations made to sensory (and other) input data in a complex process, and is changed sufficiently to be something "different enough" from what we know to be actually real.

    Direct Realism is the idea that reality is directly available to us and any intermediate transformations made by our brains is not enough to change the dial.

    Direct Realism has long been refuted. There are a number of examples, e.g. the hot and cold bucket; the straw in a glass; rainbows and other epiphenomena, etc.

  • Pleased to meet someone else who suffers from "visual snow". I'm fortunate in that like my tinnitus, I'm only acutely aware of it when I'm reminded of it, or, less frequently, when it's more pronounced.

    You're quite correct that our "reality" is in part constructed. The Flashed Face Distortion Effect [0][1] (wherein faces in the peripheral vision appear distorted due the the brain filling in the missing information with what was there previously) is just one example.

    [0] https://en.wikipedia.org/wiki/Flashed_face_distortion_effect [1] https://www.nature.com/articles/s41598-018-37991-9

    • Ah that's interesting, mine is omnipresent and occasionally bad enough I have to take days off work as I can't read my own code; it's like there's a baseline of it that occasionally flares up at random. Were you born with visual snow or did you acquire it later in life? I developed it as a teenager, and it was worsened significantly after a fever when I was a fresher.

      Also do you get comorbid headaches with yours out of interest?

      3 replies →

Whatever idea yann has of JEPA and its supposed superiority compared to LLMs, he doesn't seem to have done a good job of "selling it" without resorting to strawmanning LLMs. From what little I gathered (which may be wrong), his objection to LLMs is something like the "predict next token" inductive bias is too weak for models to be able to meaningfully learn models of things like physics, sufficient to properly predict motion and do well on physical reasoning tasks.

the fact that a not-so-direct experience of reality produces "good enough results" (eg. human intelligence) doesn't mean that a more-direct experience of reality won't produce much better results, and it clearly doesn't mean it can't produce these better results in AI

your whole reasoning is neither here not there, and attacking a straw man - YLC for sure knows that human experience of reality is heavily modified and distorted

but he also knows, and I'd bet he's very right on this, that we don't "sip reality through a narrow straw of tokens/words", and that we don't learn "just from our/approved written down notes", and only under very specific and expensive circumstances (training runs)

anything closer to more-direct-world-models (as LLMs are ofc at a very indirect level world models) has very high likelihood of yielding lots of benefits

And LLMs are trained on the humans trying to describe all of this through text. The point is not if humans have a true experience of reality, it’s that human writings are a poor descriptor of reality anyway, and so LLMs cannot be a stepping stone.

The world model of a language model is a ... language model. Imagine the mind of a blind limbless person, locked in a cell their whole life, never having experienced anything different, who just listens all day to a piped in feed of randomized snippets of WikiPedia, 4chan and math olypiad problems.

The mental model this person has of this feed of words is what an LLM at best has (but human model likely much richer since they have a brain, not just a transformer). No real-world experience or grounding, therefore no real-world model. The only model they have is of the world they have experience with - a world of words.

> humans have a direct experience of reality not mediated by as Alan Kay put it three pounds of oatmeal

Is he advocating for philosophical idealism of the mind or does he has an alternate physicalist theory?

  • I don't think he actually understands direct realism, idealism, or representational realism as distinctions whatsoever.

That way he may get a very good lizard. Getting Einstein though takes layers of abstraction.

My thinking is that such world models should be integrated with LLM like the lower levels of perception are integrated with higher brain function.