Comment by jeremyjh

6 days ago

> A "world model" is a model of a data generating process which isn't reducible-to or constituted by its measures. > However the world isnt made out of language, nor coffee made out of thermometers. So a model of the data isnt a mdoel of its generating process.

So is V-JEPA 2 actually generating a world model, as you've defined it here? Its still just sampling data - visual data, tactile feedback etc is all reducible to quantized data. It seems like you could build useful models that seem to generalize without that. For example, a model could learn to stop dropping things without ever developing a theory of gravity.

Probably I'm still misunderstanding too much for this to be useful, but what I've read from you in this thread is way more useful to my understanding than what I've seen before.

I'll have to read the JEPA article in more detail before commenting specifically on whether "world model" is appropriate. However procedural-action models have, in my view, a special place in the area of modelling the world.

While they may not be world models under my definition above, they are something like world-model-generating-models. They work like our sensory-motor system which itself builds "procedural proxy models" of the world -- and these become world models when they are cognised (, conceptualised, made abstract, made available for the imagination, etc.).

Contrast a very simple animal which can move a leaf around vs., a more complex one (eg., a mouse, etc.) which can imagine the leaf in various orientations. It's that capacity, esp. of mammals (, birds, etc.) to reify their sensory-motor "world-model-generating" capacity, eg., in imagination, which allows them to form world models in their heads. We require something like imagination in order to be able to hypotheticate a general model, form a hypothetical action, and try that action out.

I'm less concerned about making this distinct clear for casual observes in the case of robotics, because imv, competent acting in the world can lead to building world models. Whereas most other forms cannot.

What these robots require, to have world models in my view, would be firstly these sensory-motor models and then a reliable way of 1) acquiring new SM mdoels live (ie., learning motor techniques); and 2) reporting on what they have learned in a reasoning/cognitive context.

Robotics is just at stage0 here, the very basics of making a sensory-motor connection.

Sorry to go off on what may seem to be a tangent (equivocating only bcos i struggle to get the pt across succinctly?)

This too could form the basis of a productive skepticism towards the usefulness of coding agents, unlike what has caught attention here. (Referring specifically to the post by tptacek)

For example, we could look at feedback from the lisp community (beyond anecdata) on the usefulness of LLMs? Since it's what one might call "syntax-lite", a lack of true generalization ability ("no possible world model for an unavoidably idiosyncratic DSL-friendly metalanguage") could show up as a lack of ability to not just generate code, but even to fix it..

Beyond that, the issue how much the purported world-shattering usefulness of proof assistants based on say, Lean4, must depend on interpolating say, mathlib..

In short, please link the papers :)

>There are two follow up papers showing the representations are "entangled", a euphemism for statistical garbage, but I can't be bothered at the moment to find them.