← Back to context

Comment by phailhaus

20 hours ago

They don't though, they're hallucinated videos. They're feeding models tons and tons of 2D videos and hoping they figure out physics from them, instead of just using a game engine and having the LLM write something up that works 100% of the time.

On the flip side, the emergent properties that come from some of these wouldn’t be replicable by an engine. A moss covered rock realistically shedding moss as it rolls down a hill. Condensation aggregating into beads and rivulets on glass. An ant walking on a pitcher plant and being able to walk inside it and see bugs drowned from its previous meal. You’re missing the forest for the trees.

  • And then the rivulets disappear or change completely because you looked away. The reason this is a dead end is because computationally, there is absolutely no way for the model to keep track of everything that it decided. Everything is kept "in its head" rather than persisted. So what you get is a dream world, useless for training world models. It's great for prototyping, terrible for anything more durable.

    • A dreamworld where the overwhelming number of things are consistent is better that something low detail and always consistent.