Comment by bustodisgusto
3 days ago
Similar but also very different. Playwright and Selenium are browser automation frameworks. There is a Playwright-MCP server which let's your agent use Playwright for browser automation.
MCP-B is a different approach. Website owners create MCP servers `inside` their websites, and MCP-B clients are either injected by browser extensions or included in the websites JS.
Instead of visual parsing like Playwright, you get standard deterministic function calls.
You can see the blog post for code examples: https://mcp-b.ai/blogs
A playright-mcp server, or any bidi browser automation, should be equally capable of discovering/injecting and calling the same client JS exposed MCP-B site API?
It's like an OpenAPI definition but for JS/MCP? (outside of the extension to interact with that definition)
Sure they can inject clients, but that's really only beneficials for developers. doing it via browser extension means regular people can use it.
> It's like an OpenAPI definition but for JS/MCP?
Sortof. It's a true MCP server which you can use to expose existing (or new functionality on your webapp to the client)
What differentiates this from something like data-test-id attributes?
data-test-id attributes and other attributes are hardcoded and need to be know by the automator at run time. MCP-B clients request what they can call at injection time and the server responds with standard MCP tools. (functions LLM's can call with context for how to call them)
what do you mean by "visual parsing like Playwright"? I'm pretty sure Playwright queries the DOM via js, there isn't inherently any visual parsing. Do you just mean that mcp-b has dedicated js APIs for each website? Your example is also pretty confusing, it looks like the website itself offers an "Increment by x" "tool" and then your first command to the website is to "subtract two from the count". So the AI model has to still understand the mcp tools offered by the website quite loosely and just calls them as needed? I suppose this is basically like using playwright except it doesn't have to parse the DOM (although it probably still does, I mean how else will it know that the "Increment by X" tool offered is in any way connected to the "count" you mention in your vague prompt. And then the additional benefit is that it can call a js function instead of having to generate the DOM/js playwright calls to do it.
I mean all this MCP stuff certainly seems useful even though this example isn't so good, the bigger uses will be when larger APIs and interactions are offered by the website like "Make a purchase" or "sort a table" and the AI would have to implement very complex set of DOM operations and XHR requests to make that happen and instead of flailing to do that, it can call an MCP tool which is just a js function.
Sorry this is in reference to the Playwright MCP server which gives a model access to screen shots of the browser and Playwright API's.
MCP-B doesn't do any DOM parsing. It exchanges data purely over browser events.