← Back to context

Comment by cush

4 days ago

> html could probably be already used for this purpose

You’re right, and it already is, and tools like playwright MCP can easily parse a webpage to use it and get things done with existing markup today.

> BUT html is often littered with elements used for visual structure rather than semantics.

This actually doesn’t make much of a difference to a tool like playwright because it uses a snapshot of the accessibility tree, which only looks at semantic markup, ignoring any presentation

> In such a world agents wouldn’t really need a dedicated protocol

They still do though, because they can work more better when given specific tools. WebMCP could provide tools not available on the page. Like an agent hits the dominoes.com landing page. The page could provide an order_pizza tool that the agent could interact with, saving a bunch of navigation, clicks and scrolling and whatnot. It calls the order_pizza tool with “Two large pepperoni pizzas for John at <address>”, and the whole process is done.