Comment by peepeepoopoo135
6 months ago
Interesting project! Sorry if I'm out of the loop, but how exactly does the MCP server hand off visual data to an external LLM service to formulate the robot control actions? It's an interesting concept, but I'm having a hard time wrapping my head around how it works, because I thought MCP was text-oriented.
If I'm not mistaken, the idea is use MCP to let a user-facing LLM make tool calls to a VLA model with actions the user prescribes. He mentions using the LeRobot library in another comment.
+1 to this. Curious how the MCP manages base 64 image-related data and the encoding + decoding.