Comment by peepeepoopoo135

8 months ago

Interesting project! Sorry if I'm out of the loop, but how exactly does the MCP server hand off visual data to an external LLM service to formulate the robot control actions? It's an interesting concept, but I'm having a hard time wrapping my head around how it works, because I thought MCP was text-oriented.

2 comments

peepeepoopoo135

bconsta 8 months ago

If I'm not mistaken, the idea is use MCP to let a user-facing LLM make tool calls to a VLA model with actions the user prescribes. He mentions using the LeRobot library in another comment.

codybontecou 8 months ago

+1 to this. Curious how the MCP manages base 64 image-related data and the encoding + decoding.