Comment by benlivengood

1 month ago

Agreed; everyone complained that LLMs have no world model, so here we go. Next logical step is to backfill the weights with encoded video from the real world at some reasonable frame rate to ground the imagination and then branch the inference on possible interventions (actions) in the near future of the simulation, throw the results into a goal evaluator and then send the winning action-predictions to motors. Getting timing right will probably require a bit more work than literally gluing them together, but probably not much more.

2 comments

benlivengood

patapong 1 month ago

This is the most convincing take of what might actually get us to AGI I've heard so far :)