Comment by TechDebtDevin

7 months ago

hmm, I have an MCP route, that fetches the page in a browser, returns and lets the LLM inject javascript onto the page to return whatever structured output it desires..Or whatever (kinda scarily). How is this different?

--Shoutout to Go-Rod https://pkg.go.dev/github.com/go-rod/rod@v0.116.2#Page

2 comments

TechDebtDevin

miguelspizza 7 months ago

Cool, I'll check it out!

I'll need to look a bit more, but at a glance, MCP-B is more putting the onus of browser automation (i.e. how the agent will interact with the web page) on the website owner. They get to expose exactly the functionality they want to the agent

TechDebtDevin 7 months ago

Oh this is for the website owner. Yeah, mine is to make an arbitrary site interactable with an LLM. It can choose to get a map of the DOM/screenshot/extract by xml path/ and interact via a few different methods. But the PageEval() method from GO rod works pretty well
Would like to just provide a runtime for an LLM to solve captchas.
My main focus is (anti) bot detection.