Comment by vFunct

6 days ago

I'm surprised that's not how it's already done. I'd figure some of the inner layers in LLMs were already "world models" and that it's the outer layers that differentiated models between text vs. images/robotics/other modes...

That's what the propaganda says, but when we keep explaining it isn't true, and army arrives to repeat adcopy from their favourite tech guru.

All statistical models of the kind in use are interpolations through historical data -- there's no magic. So when you interpolate through historical texts, your model is of historical text.

Text is not a measure of the world, to say, "the sky is blue" is not even reliably associated with the blueness of the sky, let alone that the sky isnt blue (there is no sky, and the atmosphere isn't blue).

These models appear "capture more" only because when you interpret the text you attribute meaning/understanding to it as the cause of its generation -- but that wasnt the cause, this is necessarily an illusion. There is no model of the world in a model of historical text -- there is a model of the world in your head which you associate with text, and that association is exploited when you use LLMs to do more than mere syntax transformation.

LLMs excel most at "fuzzy retrieval" and things like coding -- the latter is principally a matter of syntax, and the former of recollection. As soon as you require the prompt-completion to maintain "semantic integrity" with non-syntactical/retrivable constraints, it falls apart.

  • I feel like you are ignoring or dismissing the word "interpolating", although a better word would likely be generalization. I'd make the claim that it's very hard to generalize without some form of world model. It's clear to me that transformers do have some form of world model, although not the same as what is being presented in V-JEPA.

    One other nitpick is that you confine to "historical data", although other classes of data are trained on such as simulated and generative.

    • I didn't say generalisation, because there isnt any. Inductive learning does not generalise, it interpolates -- if the region of your future prediction (here, prompt competition) lies on or close to the interpolated region, then the system is useful.

      Generalisation is the opposite process, hypothecating a universal and finding counter-examples to constrain the universal generalisaton. Eg., "all fire burns" is hypotheticated by a competent animal upon encountering fire once.

      Inductive "learners" take the opposite approach: fire burns in "all these cases", and if you have a case similar to those, then fire will burn you.

      They can look the same within the region of interpolation, but look very different when you leave it: all of these systems fall over quickly when more than a handful of semantic constraints are imposed. This number is a measure of the distance from the interpolated boundary (e.g., consider this interpretation of apple's latest paper on reasoning in LLMs: the "environment complexity" is nothing other than a measure of interpolation-dissimilarity).

      Early modern philosophers of science were very confused by this, but it's in Aristotle plain-as-day, and it's also extremely well establish since the 80s as the development of formal computational stats necessitated making this clear: interpolation is not generalisation. The former does not get you robustness to irrelevant permuation (ie., generalisation); it does not permit considering counterfactual scenarios (ie., generalisation); it does not give you a semantics/theory of the data generating process (ie., generalisation, ie. a world model).

      Interpolation is a model of the data. Generalisation requires a model of the data generating process, the former does not give you the latter, though it can appear to under strong experimental assumptions of known causal models.

      Here LLMs model the structure of language-as-symbolic-ordering, that structure "in the interpolated region" expresses reasoning, but it isnt a model of reasoning. It's a model of reasoning as captured in historical cases of it.

      8 replies →

  • > army arrives to repeat adcopy from their favourite tech guru

    This is painfully accurate.

    The conversations go like this:

    Me: “guys, I know what I’m talking about, I wrote my first neural network 30 years ago in middle school, this tech is cool but it isn’t magic and it isn’t good enough to do the thing you want without getting us sued or worse.”

    Them: “Bro, I read a tweet that we are on the other side of the singularity. We have six months to make money before everything blows up.”