← Back to context

Comment by foota

5 days ago

No, I don't think you're thinking about this right. It's more like hacker news would expose an MCP when you visit it that would present an alternative and parallel interface to the page, not "click button" tools.

You're both right. The page can expose MCP tools like via a form element which is as simple as adding an attribute to an existing form and completely aligns with existing semantic HTML - eg submitting an HN "comment". Additionally, the page can define additional tools in javascript that aren't in forms - eg YouTube could provide a transcript MCP defined in JS which fetches the video's transcript

https://developer.chrome.com/blog/webmcp-epp

  • I think that rest and html could probably be already used for this purpose BUT html is often littered with elements used for visual structure rather than semantics.

    In an ideal world html documents should be very simple and everything visual should be done via css, with JavaScript being completely optional.

    In such a world agents wouldn’t really need a dedicated protocol (and websites would be much faster to load and render, besides being much lighter on cpu and battery)

    • > html could probably be already used for this purpose

      You’re right, and it already is, and tools like playwright MCP can easily parse a webpage to use it and get things done with existing markup today.

      > BUT html is often littered with elements used for visual structure rather than semantics.

      This actually doesn’t make much of a difference to a tool like playwright because it uses a snapshot of the accessibility tree, which only looks at semantic markup, ignoring any presentation

      > In such a world agents wouldn’t really need a dedicated protocol

      They still do though, because they can work more better when given specific tools. WebMCP could provide tools not available on the page. Like an agent hits the dominoes.com landing page. The page could provide an order_pizza tool that the agent could interact with, saving a bunch of navigation, clicks and scrolling and whatnot. It calls the order_pizza tool with “Two large pepperoni pizzas for John at <address>”, and the whole process is done.