Comment by znpy
5 days ago
I think that rest and html could probably be already used for this purpose BUT html is often littered with elements used for visual structure rather than semantics.
In an ideal world html documents should be very simple and everything visual should be done via css, with JavaScript being completely optional.
In such a world agents wouldn’t really need a dedicated protocol (and websites would be much faster to load and render, besides being much lighter on cpu and battery)
> html could probably be already used for this purpose
You’re right, and it already is, and tools like playwright MCP can easily parse a webpage to use it and get things done with existing markup today.
> BUT html is often littered with elements used for visual structure rather than semantics.
This actually doesn’t make much of a difference to a tool like playwright because it uses a snapshot of the accessibility tree, which only looks at semantic markup, ignoring any presentation
> In such a world agents wouldn’t really need a dedicated protocol
They still do though, because they can work more better when given specific tools. WebMCP could provide tools not available on the page. Like an agent hits the dominoes.com landing page. The page could provide an order_pizza tool that the agent could interact with, saving a bunch of navigation, clicks and scrolling and whatnot. It calls the order_pizza tool with “Two large pepperoni pizzas for John at <address>”, and the whole process is done.