Comment by polishdude20
5 days ago
Ok I was under the impression (due to the cameras) that it's doing something with machine learning or can do a novel movement. This is just recording movements and playing them back.
5 days ago
Ok I was under the impression (due to the cameras) that it's doing something with machine learning or can do a novel movement. This is just recording movements and playing them back.
If you bridge recorded trajectories with LVLM, then cameras are necessary visual input for LLM to decide which sub-tasks need to be performed to accomplish long-horizon task, and sub-tasks correspond to pre-recorded ("blind") trajectories which are replayed.
If you go beyond pre-recorded "blind" trajectories into more robust task-policies (which you would have to train from many demonstrations) then cameras become necessary to execute the sub-task.