Comment by jauntywundrkind
13 hours ago
WebMCP is mediated by the browser/page & has the full context of the user's active page/session available to it.
Websites that do offer real APIs usually have them as fairly separate things from the web's interface. So there's this big usability gap, where what you do on the API doesn't show up clearly on the web. If the user is just hitting API endpoints unofficially, it can create even worse unexpected split brain problems!
WebMCP offers something new: programmatic control endpoints that work well with what the user is actually seeing. A carefully crafted API can offer that, but this seamless interoperation of browsing and webmcp programmatic control is a novel very low impedance tie together that I find greatly promising for users, in a way that APIs never were.
And the starting point is far far less technical, which again just reduces that impedance mismatch that is so daunting about APIs.
The whole point of an agent, though, is to overcome obstacles to accomplish tasks on your behalf. And since an agent is a computer program, the most efficient way to accomplish tasks using computer services is though APIs. Websites are first and foremost human interfaces, not computer interfaces.
Having an agent use a browser to accomplish tasks on the principal’s behalf is a backstop. It’s for when service providers refuse to implement APIs—and they frequently refuse to do this on purpose. And I expect they will continue to make it as difficult as possible for agents to automate website-based extraction for the same reason they don’t provide APIs. If you thought Captcha solving was a nuisance already, expect it to get worse.
I think that is incredibly foolish a perspective. Rooted in old ridiculous slipshod biases, with no respect for users & their agency, and makes unsupported weak technical arguments that define away the possibility of APIs being anything but better.
> the most efficient way to accomplish tasks using computer services is though APIs
You don't state efficient at what, so I'll first argue you best case: energy efficient, least amount of computing done. Both provide mechanistic access. If the user already had the browser open and is going for help, the difference is nearly nothing. It's different wire formats. We are talking the smallest tiniest peanuts of difference. Arguing this either way is not worth the bits such argument would be stored on; it's trivial.
But this misses the broader view. Efficient at what? And I think you are thousands of miles of off, have reduced LLM's to an idealized state, that is starkly naive to what the job actually is.
First, let's go through the rest of the shit field of bad definitions and terms you have laid down to avoid having to think about or address any of the possibilities of webmcp and how it could be apt.
> Websites are first and foremost human interfaces, not computer interfaces
Which is why webmcp is a valuable contribution, so now the web page can have parity with all the other tools offered to an LLM. So that now you can stay on the page and still have a fantastic first class machine interface, from the page you are on.
> [Web browsing control] is for when service providers refuse to implement APIs—
Which WebMCP is a direct answer to, by allowing pages to offer a low friction access path that allows mechanistic control. Without the LLM having to "backstop" scrape and parse and puppeteer/playwright/devtools-protocol it's way through.
I suspect you are right that many players out there will seek control & domination of their users, and will reject webmcp and be layering on more constraints. This isn't an argument against webmcp. It's a moral/philosophical/economic statement of where the world is today, of the battle of intermediation/control capitalism that actively works against humanity/agency. WebMCP is a protocol to help agency & tools become more ubiquitous, more regular, more human, more natural. If it works, it makes the intermediation/control camp look bad. The good sites helping their users make mockery of those who keep layering anti-user anti-freedom hostility into their systems. WebMCP amplifies this struggle by making doing good and right things easier for sites, that is more visible and clear to users. Will eventually the bad people clamping down on hackery freedom eventually hear the music, reform their sick anti-human anti-possibility high-control ways? Or will they continue the path of eternal degredation? Unknown. But WebMCP makes better relations with sites possible. (Hopefully there is peril to ignoring this betterment.)
Sibfeel like I've tried to address what seem to me to be significant misses and misdirections you have put out.
Instead of tripping over what has been, lets finally get to the two aspects of users and their LLM agents that I think are crucial to assessing the potential value of WebMCP:
1. LLM's are adaptable & guidable. They are peers that we work with; there is more possible than a once off assignment of tasks. Our human agency is most amplified when we can interact and steer the course alongside the agent, when we can form opinions on its work. Driving a website that the user knows and is familiar is a shared medium that the agent and the human can work together on, refining as we go, to get to a success state.
If the agent is using an API, they have to craft a de-novo interface at every step of the process, either as text responses or MCP UI or other. The agent has to reinterpret and describe: it can't just show us what is, short of showing us OpenAPI definitions and json payloads.
2. I've already talked about the process, but the definition of done in "accomplish tasks on your behalf" also insufficiently describes what LLMs need to do. Accomplishing the task is only part of the job: giving the results to the user, showing them the final state is a key part of the agent+human work-cycle. Verifying the results is vital! Agents make all kinds of incorrect assumptions as they go, need real help! How does the LLM prove it sent the strawberry muffin recipe to grandma? If there is an api, the agent can say the request responded 200. But was it the right request? Using APIs means having to have undeservedly high levels of trust in the agent. Layering agency onto the web allows the agent to perform, in a way that users can see and gain the knowledge/insight & verification at the end of the process quickly.
> Having an agent use a browser to accomplish tasks on the principal’s behalf is a backstop
In conclusion, I argue that this is a deep misunderstanding of what the agent's role is. It is a co-partner to us humans, helping us not by achieving tasks on its own independently, but by working actively along side is in a multiplayer fashion, as a peer, not a distant autonomous system. Turning the web into a shared medium where users and agents can work together would greatly enhance LLM's ability to meaningfully accomplish their tasks alongside their humans, and would improve accomplishing the task of telling the human about it after, by giving the human the well known trustworthy interface they already are familiar with.