← Back to context

Comment by colinator

6 hours ago

Yeah, the mechanism by which the LLMs control the robot is by writing code. I suppose they could also issue direct joint sequences, but I thought that they're so good at writing code already, might as well do that. So if they 'wanted' to they could write code with an explicit joint sequence they calculate in-context. That one seems more difficult.

So they can go 'slow', by taking a camera image, controlling the robot, repeating. Or they can write code that runs closer to the robot in a loop, either way. I thought the latter was somehow more impressive, and that's what you see in the hand-tracking example.