Comment by jauntywundrkind

5 hours ago

I think that is incredibly foolish a perspective. Rooted in old ridiculous slipshod biases, with no respect for users & their agency, and makes unsupported weak technical arguments that define away the possibility of APIs being anything but better.

> the most efficient way to accomplish tasks using computer services is though APIs

You don't state efficient at what, so I'll first argue you best case: energy efficient, least amount of computing done. Both provide mechanistic access. If the user already had the browser open and is going for help, the difference is nearly nothing. It's different wire formats. We are talking the smallest tiniest peanuts of difference. Arguing this either way is not worth the bits such argument would be stored on; it's trivial.

But this misses the broader view. Efficient at what? And I think you are thousands of miles of off, have reduced LLM's to an idealized state, that is starkly naive to what the job actually is.

First, let's go through the rest of the shit field of bad definitions and terms you have laid down to avoid having to think about or address any of the possibilities of webmcp and how it could be apt.

> Websites are first and foremost human interfaces, not computer interfaces

Which is why webmcp is a valuable contribution, so now the web page can have parity with all the other tools offered to an LLM. So that now you can stay on the page and still have a fantastic first class machine interface, from the page you are on.

> [Web browsing control] is for when service providers refuse to implement APIs—

Which WebMCP is a direct answer to, by allowing pages to offer a low friction access path that allows mechanistic control. Without the LLM having to "backstop" scrape and parse and puppeteer/playwright/devtools-protocol it's way through.

I suspect you are right that many players out there will seek control & domination of their users, and will reject webmcp and be layering on more constraints. This isn't an argument against webmcp. It's a moral/philosophical/economic statement of where the world is today, of the battle of intermediation/control capitalism that actively works against humanity/agency. WebMCP is a protocol to help agency & tools become more ubiquitous, more regular, more human, more natural. If it works, it makes the intermediation/control camp look bad. The good sites helping their users make mockery of those who keep layering anti-user anti-freedom hostility into their systems. WebMCP amplifies this struggle by making doing good and right things easier for sites, that is more visible and clear to users. Will eventually the bad people clamping down on hackery freedom eventually hear the music, reform their sick anti-human anti-possibility high-control ways? Or will they continue the path of eternal degredation? Unknown. But WebMCP makes better relations with sites possible. (Hopefully there is peril to ignoring this betterment.)

Sibfeel like I've tried to address what seem to me to be significant misses and misdirections you have put out.

Instead of tripping over what has been, lets finally get to the two aspects of users and their LLM agents that I think are crucial to assessing the potential value of WebMCP:

1. LLM's are adaptable & guidable. They are peers that we work with; there is more possible than a once off assignment of tasks. Our human agency is most amplified when we can interact and steer the course alongside the agent, when we can form opinions on its work. Driving a website that the user knows and is familiar is a shared medium that the agent and the human can work together on, refining as we go, to get to a success state.

If the agent is using an API, they have to craft a de-novo interface at every step of the process, either as text responses or MCP UI or other. The agent has to reinterpret and describe: it can't just show us what is, short of showing us OpenAPI definitions and json payloads.

2. I've already talked about the process, but the definition of done in "accomplish tasks on your behalf" also insufficiently describes what LLMs need to do. Accomplishing the task is only part of the job: giving the results to the user, showing them the final state is a key part of the agent+human work-cycle. Verifying the results is vital! Agents make all kinds of incorrect assumptions as they go, need real help! How does the LLM prove it sent the strawberry muffin recipe to grandma? If there is an api, the agent can say the request responded 200. But was it the right request? Using APIs means having to have undeservedly high levels of trust in the agent. Layering agency onto the web allows the agent to perform, in a way that users can see and gain the knowledge/insight & verification at the end of the process quickly.

> Having an agent use a browser to accomplish tasks on the principal’s behalf is a backstop

In conclusion, I argue that this is a deep misunderstanding of what the agent's role is. It is a co-partner to us humans, helping us not by achieving tasks on its own independently, but by working actively along side is in a multiplayer fashion, as a peer, not a distant autonomous system. Turning the web into a shared medium where users and agents can work together would greatly enhance LLM's ability to meaningfully accomplish their tasks alongside their humans, and would improve accomplishing the task of telling the human about it after, by giving the human the well known trustworthy interface they already are familiar with.

4 comments

jauntywundrkind

otterley 1 hour ago

Wow, that was a lot of words.

> I think that is incredibly foolish a perspective. Rooted in old ridiculous slipshod biases, with no respect for users & their agency, and makes unsupported weak technical arguments that define away the possibility of APIs being anything but better.

Since this is a technical discussion, let's debate these based on their technological pros and cons, and avoid the characterizations, shall we?

> You don't state efficient at what, so I'll first argue you best case: energy efficient, least amount of computing done.

Yup.

> Both provide mechanistic access. If the user already had the browser open and is going for help, the difference is nearly nothing. It's different wire formats. We are talking the smallest tiniest peanuts of difference.

The difference may be "nearly nothing" at individual scale but not at global scale. The aggregate difference in energy and data transfer required to power a full browser experience vs. APIs is enormous. If it weren't true, Google, Amazon, and Meta wouldn't have spent nearly as much blood and treasure in optimizations, both in hardware and software, as they have over the last 25+ years. You can't just hand-wave this away. If you told Google and Meta that gRPC and Thrift were "peanuts of difference" and "trivial" they'd laugh in your face and show you the door. (You can always tell when someone's not an experienced engineer as soon as they bandy about the word "trivial.")

Again, browser-based interfaces are for humans. They change frequently, often at the whim of designers. As they evolve, agents must evolve with them. That sort of instability contributes to the resources needed to mechanize them. Compare against APIs, which often have stability guarantees, or at the very least, are only additive over time.

> Which WebMCP is a direct answer to, by allowing pages to offer a low friction access path that allows mechanistic control. Without the LLM having to "backstop" scrape and parse and puppeteer/playwright/devtools-protocol it's way through.

This I understand. But APIs are even more efficient still.

> If the agent is using an API, they have to craft a de-novo interface at every step of the process, either as text responses or MCP UI or other. The agent has to reinterpret and describe: it can't just show us what is, short of showing us OpenAPI definitions and json payloads.

I think you may be underestimating the extent to which this will need to happen with browser-based MCP connectivity as well.

Unfortunately I don't have the time to dive deep into the rest of your comment, as it's just too verbose and narrative-driven. If you'd like to make concise and concrete technological arguments, though, I'm open to that.

jauntywundrkind 42 minutes ago
No, the characterization is very important. You've shown no connection to what's actually at stake, to the engagement patterns here, to the need for people to actually use agents in a way they understand, to the needs to work through & arrive together with your agent at an answer. We cant have a technical discussion until you actually show some engagement in the core topics, but you have been too busy raising frivolous objections to derail anyone thinking about the actual topic and technology.
Your proposal to use APIs is a grossly inefficient waste of LLM's time and energy, and far worse, a misuse of human attention that could be much better directed with the multiplayer/coop/peership of webmcp. You propose inventing brand new communication systems for every interaction, and haven't once considered the merits of leveraging the existing communication medium that users know. Rather than engage in WebMCP & what it brings, it's been trying to hide and confuse the matter & bury any discussion under a sea of objections, objections that don't even carry technical merit. If you want to actually reply to any of the interesting things rather than blocking and obstructing discussion, I'll happily re-engage.
I've found everything you have said to be radically damaging to understanding the problems that be, by vastly limiting consideration away from all interesting topics and raising only naysaying quibbles that don't address how users and agents would actually do work. Users and agents need to work together. That's simple, and your posts actively distract from what's unique and different here. I'm not going to accept another null response and then waste my time again, and it's sad that people have been steered away from thoughtful consideration like this.
- otterley 28 minutes ago
  
  If your argument has merit, we will see it win in the marketplace. If it doesn’t, then it will not. Simple as that. And I’m definitely not the only one who is looking for an explanation of why an agent-browser interface is the superior approach vis-a-vis the alternatives.
  I’m not entirely sure what your angle is, but your tirade makes it sound like you’re emotionally invested in this (and potentially financially invested) and you’re frightened. A confident person doesn’t need all these histrionics.
  
  1 reply →