← Back to context

Comment by jeremyjh

6 days ago

Aren’t there papers showing that there is some kind of world model emerging? Like representations of an Othello board that we would recognize were found and manipulated successfully in a small model.

There are two follow up papers showing the representations are "entangled", a euphemism for statistical garbage, but I can't be bothered at the moment to find them.

However the whole issue of othello is a nonsequiteur which indicates that people involved here don't really seem to understand the issue, or what a world model is.

A "world model" is a model of a data generating process which isn't reducible-to or constituted by its measures. Ie., we are concerned for the case where there's a measurement space (eg., that of the height of mercury in a thermometer) and a target property space (eg., that of the temperature of the coffee). So that there is gap between the data-as-measure and its causes. In language this gap is massive: the cause of my saying, "I'm hungry" may have nothing to do with my hunger, even if it often does. For "scientific measuring devices", these are constructed to minimize this gap as much as possible.

In any case, with board games and other mathematical objects, there is no gap. The data is the game. The "board state" is an abstract object constituted by all possible board states. The game "is made out of" its realisations.

However the world isnt made out of language, nor coffee made out of thermometers. So a model of the data isnt a mdoel of its generating process.

So whether an interpolation of board states "fully characterises", someway, an abstract mathematical object "the game" is so irrelevant to the question it betrays a fundamental lack of understanding of even what's at issue.

No one is arguing that a structured interpolative model (ie., one given an inductive bias by an NN architecture) doesn't express properties of the underlying domain in its structure. The question is what happens to this model of the data when you have the same data generating process, but you arent in the interpolated region.

This problem is, in the limit of large data, impossible for abstract games by their nature, eg., a model classifying the input X into legal/illegal board states is the game.

Another way of phrasing this is that in ML/AI textbooks often begin by assuming there's a function you're approximating. But in the vast majority of cases where NNs are used, there is no such function -- there is no function tokens -> meanings (eg., "i am hungry" is ambigious).

But in the abstract math case there is a function, {boards} -> Legal|Illegal is a function, there are no ambiguous boards

So: of the infinite number of f* approximations to f_game, any is valid in the limit len(X) -> inf. Of the infinite number f*_lang to f_language, all are invalid (each in their own way).

  • > A "world model" is a model of a data generating process which isn't reducible-to or constituted by its measures. > However the world isnt made out of language, nor coffee made out of thermometers. So a model of the data isnt a mdoel of its generating process.

    So is V-JEPA 2 actually generating a world model, as you've defined it here? Its still just sampling data - visual data, tactile feedback etc is all reducible to quantized data. It seems like you could build useful models that seem to generalize without that. For example, a model could learn to stop dropping things without ever developing a theory of gravity.

    Probably I'm still misunderstanding too much for this to be useful, but what I've read from you in this thread is way more useful to my understanding than what I've seen before.

    • I'll have to read the JEPA article in more detail before commenting specifically on whether "world model" is appropriate. However procedural-action models have, in my view, a special place in the area of modelling the world.

      While they may not be world models under my definition above, they are something like world-model-generating-models. They work like our sensory-motor system which itself builds "procedural proxy models" of the world -- and these become world models when they are cognised (, conceptualised, made abstract, made available for the imagination, etc.).

      Contrast a very simple animal which can move a leaf around vs., a more complex one (eg., a mouse, etc.) which can imagine the leaf in various orientations. It's that capacity, esp. of mammals (, birds, etc.) to reify their sensory-motor "world-model-generating" capacity, eg., in imagination, which allows them to form world models in their heads. We require something like imagination in order to be able to hypotheticate a general model, form a hypothetical action, and try that action out.

      I'm less concerned about making this distinct clear for casual observes in the case of robotics, because imv, competent acting in the world can lead to building world models. Whereas most other forms cannot.

      What these robots require, to have world models in my view, would be firstly these sensory-motor models and then a reliable way of 1) acquiring new SM mdoels live (ie., learning motor techniques); and 2) reporting on what they have learned in a reasoning/cognitive context.

      Robotics is just at stage0 here, the very basics of making a sensory-motor connection.

    • Sorry to go off on what may seem to be a tangent (equivocating only bcos i struggle to get the pt across succinctly?)

      This too could form the basis of a productive skepticism towards the usefulness of coding agents, unlike what has caught attention here. (Referring specifically to the post by tptacek)

      For example, we could look at feedback from the lisp community (beyond anecdata) on the usefulness of LLMs? Since it's what one might call "syntax-lite", a lack of true generalization ability ("no possible world model for an unavoidably idiosyncratic DSL-friendly metalanguage") could show up as a lack of ability to not just generate code, but even to fix it..

      Beyond that, the issue how much the purported world-shattering usefulness of proof assistants based on say, Lean4, must depend on interpolating say, mathlib..

      In short, please link the papers :)

      >There are two follow up papers showing the representations are "entangled", a euphemism for statistical garbage, but I can't be bothered at the moment to find them.