← Back to context

Comment by TechDebtDevin

2 days ago

hmm, I have an MCP route, that fetches the page in a browser, returns and lets the LLM inject javascript onto the page to return whatever structured output it desires..Or whatever (kinda scarily). How is this different?

--Shoutout to Go-Rod https://pkg.go.dev/github.com/go-rod/rod@v0.116.2#Page

Cool, I'll check it out!

I'll need to look a bit more, but at a glance, MCP-B is more putting the onus of browser automation (i.e. how the agent will interact with the web page) on the website owner. They get to expose exactly the functionality they want to the agent

  • Oh this is for the website owner. Yeah, mine is to make an arbitrary site interactable with an LLM. It can choose to get a map of the DOM/screenshot/extract by xml path/ and interact via a few different methods. But the PageEval() method from GO rod works pretty well

    Would like to just provide a runtime for an LLM to solve captchas.

    My main focus is (anti) bot detection.