I use Playwright to intercept all requests and responses and have Claude Code navigate to a website like YouTube and click and interact with all the elements and inputs while recording all the requests and responses associated with each interaction. Then it creates a detailed strongly typed API to interact with any website using the underlying API.
Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
I also do this. My primary use case is for reproducing page layout and styling at any given tree in the dom. So, capturing various states of a component etc.
I also use it to automatically retrieve page responsiveness behavior in complex web apps. It uses playwright to adjust the width and monitor entire trees for exact changes which it writes structured data that includes the complete cascade of styles relevant with screenshots to support the snapshots.
There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans. So, lots of clickety-clackety and human speed results.
---
My first reaction to seeing this FP was why are people still releasing MCPs? So far I've managed to completely avoid that hype loop and went straight to building custom CLIs even before skills were a thing.
I think people are still not realizing the power and efficiency of direct access to things you want and skills to guide the AI in using the access effectively.
Maybe I'm missing something in this particular use case?
> My first reaction to seeing this FP was why are people still releasing MCPs?
MCPs are more difficult to use. You need to use an agent to use the tools, can't do it manually easily. I wonder if some people see that friction as a feature.
I love how HN is loving this idea when it's the exact same thing Anthropic and OpenAi (and every other llm maker) did.
It's God's gift to them when it lets them bypass ads and dl copyrighted material. But it's Satan's curse on humanity when the Zuck does it to train his llm and dl copyrighted material.
So you’re that Hal Jordan then? Why would a Green Lantern feel the need to defend either? I feel like the Guardians would not accept your arguments as soon as you got to Oa, poozer. I guess what I am saying is don’t have a famous name. Seems obvious.
You conflate web crawling for inference with web crawling for training.
Web crawling for training is when you ingest content on a mass scale, usually indiscriminately, usually with a dumb crawler for scale's sake, for the purposes of training an LLM. You don't really care whether one particular website is in the dataset (unless it's the size of Reddit), you just want a large, diverse, high-quality data mix.
Web crawling for inference is when a user asks a targeted question, you do a web search, and fetch exactly those resources that are likely to be relevant to that search. Nothing ends up in the training data, it's just context enrichment.
People have a much larger issue with crawling for training than for inference (though I personally think both are equally ok).
It would probably help people who want to go to a concert and have a chance to beat the scalpers cornering the market on an event in 30 seconds hitting the marketplace services with 20,000 requests.
I can try to see if can bypass yt-dlp. But that is always a cat and mouse game.
Exactly, it is an agent skill that interacts pressing buttons and stuff with a webpage capturing and documenting all the API requests the page makes using Playwright's request / response interception methods. It creates and strongly typed well documented API at the end.
It turns any authenticated browser session into a fully typed REST API proxy — exposing discovered endpoints as local Hono routes that relay requests through the browser, so cookies and auth are automatic.
The point is that it creates an API proxy in code that a Typescript server calls directly. The AI runs for about 10 minutes with codegen. The rest of the time it is just API calls to a service. Remove the endpoint for "Delete Account" and that API endpoint never gets called.
This 100% breaks everyone's terms of service. I would not recommend nor encourage using.
100% I'll response to this by Friday with link to Github.
I use Patchright + Ghostery and I have a cleaver tool that uses web sockets to pass 1 second interval screenshots to the a dashboard and pointer / keyboard events to the server which allow interacting with websites so that a user can create authentication that is stored in the chrome user profile with all the cookies, history, local storage, ect.. in the cloud on a server.
Can you list some websites that don't require subscription that you would like to me to test against? I used this for Robinhood and I think Linked in would be a good example for people to use.
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
The big upside of the MCP is that it connects to already open browser windows. I tried the skill but it always tries to open new windows. Is there a way to get the `--autoConnect` behaviour with the CLI?
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
Signing in may provide a higher rate limit if you are not already signed in.
For more on scraping GitHub and how it may affect your rights, please review our Terms of Service.
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
On one hand, cool demo, on the other, this is horrifying in more ways than I can begin to describe. You're literally one prompt injection away from someone having unlimited access to all of your everything.
Not the person you're replying to, but: I just use a separate, dedicated Chrome profile that isn't logged into anything except what I'm working on. Then I keep the persistence, but without commingling in a way that dramatically increases the risk.
edit: upon rereading, I now realize the (different) prompt injection risk you were calling out re: the handoff to yt-dlp. Separate profiles won't save you from that, though there are other approaches.
As long as it’s gated and not turned on by default, it’s all good. They could also add a warning/sanity check similar to “allow pasting” in the console.
> Most browser automation tools launch a fresh, isolated browser. This one connects to the Chrome you're already running
Is this the same as what Claude in Chrome does?
I tried that for a while and since I use Firefox and Chromium, the security problem of it seeing your tabs wasn't a big deal. Fresh Chrome install, only ever used for this exact purpose. Plus you can watch it working in real (actually very slow) time so if you did point it at something risky you can take over at any point.
For actual testing of web apps though, a skill with playwright cli in headless mode is much more effective. About 1-2k context per interaction after a bit of tuning.
To be clear, this isn't a skill for the devtools mcp, but an independent project. It doesn't look bad, but obviously browser automation + agents is a very busy space with lots of parallel efforts.
DevTools MCP and its new CLI are maintained by the team behind Chrome DevTools & Puppeteer and it certainly has a more comprehensive feature set. I'd expect it to be more reliable, but.. hey open source competition breeds innovation and I love that. :)
(I used to work on the DevTools team. And I still do, too)
Google is so far behind agentic cli coding. Gemini CLI is awful. So bad in fact that it’s clear none of their team use it. Also MCP is very obviously dead, as any of us doing heavy agentic coding know. Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in. Playwright with headless Chromium or headed chrome is what anyone serious is using and we get all the dev and inspection tools already. And it works perfectly. This only has appeal to those starting out and confused into thinking this is the way. The answer is almost never MCP.
> Also MCP is very obviously dead, as any of us doing heavy agentic coding know.
As someone that does heavy agentic coding (using basically all the tools), this is so far from the truth. People claiming this have probably never worked in large enterprise environments where things like authentication, RBAC, rate limiting, abuse detection, centralized management/updates/ops, etc. are a huge part of the development and deployment workflow.
In these situations you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity. MCP is really useful here, and allows centralized eng and ops teams to manage their services in a way that aligns with the organizations overall posture, policies, and infrastructure.
> Google is so far behind agentic cli coding. Gemini CLI is awful.
This part I totally agree. It's really hard to express how bad it is (and it's really disappointing.)
> you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity
You're describing MCP. After all, MCP is just reinventing the OpenAPI wheel. You can just have a self-documenting REST API using OpenAPI. Put the spec in your context and your model knows how to use it. You can have all the RBAC and rate limiting and auth you want. Heck, you could even build all that complexity into a CLI tool if you want. MCP the protocol doesn't actually enable anything. And implementing an MCP server is exactly as complex as using any other established protocol if you're using all those features anyway
Given MCP is supposed to just be a standardised format for self-describing APIs, why are all the features you listed MCP related things? It sounds more like it's forced the enterprise to build such features which cli tooling didn't have?
FYI: Gemini Cli is used internally at Google. It's actually more popular than Antigravity. Google uses MCP services internally for code search (since everything is in a mono-repo you don't want to waste time grepping billions of files), accessing docs and bugs, and also accessing project specific RAG databases for expertise grounding.
Some people will push back on this. They are holding out hope that the recent improvements Anthropic has made in this regard have improved the context rot problem with MCP. Anthropic's changes improve things a little. But it is akin to putting lipstick on a pig. It helps, but not much.
The reason MCP is dying/dead is because MCP servers, once configured, bloat up context even when they are not being used. Why would anybody want that?
Use agent skills. And say goodbye to MCP. We need to move on from MCP.
Is your agent harness dropping the entire MCP server tool description output directly into the context window? Is your agent harness always addig MCP servers to the context even when they are not being used?
MCP is a wire format protocol between clients and servers. What ends up inside the context window is the agent builder's decision.
I'm a layman here. How is a skill any better? Aren't agent tools loaded on-demand, just as a skill would be? People are mentioning OpenAPI, but wouldn't you need to load the spec for that too?
> it is akin to putting lipstick on a pig. It helps, but not much.
The lipstick helps? This had me in stitches. Sorry for the non-additive reply. This is the funniest way I have seen this or any other phrase explained. By far. Honestly has made my day and set me up for the whole week.
The bloat problem is already out dated though. People are having the LLM pick the MCP servers it needs for a particular task up front, or picking them out-of-band, so the full list doesn't exist in the context every call.
MCP is dead? Which cli tool should we use to instruct Chrome to open a page and click the Open button? And to read what appears in the console after clicking?
MCP permanently sacrifice a chunk of the context window? And a skill for you cli is free?
MCP is very much not dead. centralized remote MCP servers are incredibly useful. also bespoke CLIs still require guidance for models to use effectively, so it's clear that token efficiency is still an issue regardless.
Tbh I find self-documenting CLIs (e.g. with a `--help` flag, and printing correct usage examples when LLMs make things up) plus a skill that's auto invoked to be pretty reliable. CLIs can do OAuth dances too just fine.
MCP's remaining moats I think are:
- No-install product integrations (just paste in mcp config into app)
- Non-developer end users / no shell needed (no terminal)
- Multi-tenant auth (many users, dynamic OAuth)
- Security sandboxing (restrict what agents can do), credential sandboxing (agents never see secrets)
I see remote MCP servers as a great interface to consume api responses. The idea that you essentially make your apis easily available to agents to bring in relevant context is a powerful one.
When folks say MCP is dead, I don't get it. What other alternatives exist in place of MCP? Arbitrary code via curl/sdks to call a remote endpoint?
I think cli’s are more token efficient- the help menu is loaded only when needed, and the output is trivially pipe able to grep or jq to filter out what the model actually wants
I don't know if this just anecdotal random impression, but in a last week or two I had mostly good experience with Google cli. While previously I constantly complained about it. I have been using it together with codex, and I would not say that one is much better than another.
It is hard to say nowadays, when things change so quickly
I know it’s a bit of a tangent but man you’re right re. Gemini CLI. It’s woefully bad, barely works. Maybe because I was a “free” user trying it out at the time, but it was such a bad experience it turned me off subscribing to whatever their coding plan is called today.
it's not the CLI, it's the model. The model wasn't trained to do that kind of work, was trained to do one shot coding, not sustained back and forth until it gets it right like Claude and ChatGPT.
Couldn't have been more wrong. MCP despite its manageable downsides is leagues ahead of anything else in many ways.
The fact that SoTA models are trained to handle MCP should be hint enough to the observant.
I probably build one MCP tool per week at work.
And every project I work on gets its own MCP tool too. It's invaluable to have specialized per-project tooling instead of a bunch of heterogeneous scripts+glue+prayer.
> So bad in fact that it’s clear none of their team use it.
I use it extensively, many of my colleagues do. I get a ton of value out of it. Some prefer Antigravity, but I prefer Gemini CLI. I get fairly long trajectories out of it, and some of my colleagues are getting day-long trajectories out of it. It has improved massively since I started using it when it first came out.
> Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in
What about all the CLI tools not baked into the model's priors?
Every time someone says "extensibility mechanism X is dead!", I think "Well, I guess that guy isn't doing anything that needs to extend the statistical average of 2010s-era Reddit"
Been using this one for a while, mostly with codex on opencode. It's more reliable and token efficient than other devtools protocol MCPs i've tried.
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
I've been using TideWave[1] for the last few months and it has this built-in. It started off as an Elixir/LiveView thing but now they support popular JavaScript frameworks and RoR as well. For those who like this, check it out. It even takes it further and has access to the runtime of your app (not just the browser).
The agent basically is living inside your running app with access to databases, endpoints etc. It's awesome.
We tested this — the default take_snapshot path (Accessibility.getFullAXTree) is safe. It filters display:none elements because they're excluded from the accessibility tree.
But evaluate_script is the escape hatch. If an agent runs document.body.textContent instead of using the AX tree, hidden injections in display:none divs show up in the output. innerText is safe (respects CSS visibility), textContent is not (returns all text nodes regardless of styling).
The gap: the agent decides which extraction method to use, not the user. When the AX tree doesn't return enough text, a plausible next step is evaluate_script with textContent — which is even shown as an example in the docs.
Also worth noting: opacity:0 and font-size:0 bypass even the safe defaults. The AX tree includes those because the elements are technically 'rendered' and accessible to screen readers. display:none is just the most common hiding technique, not the only one.
I’ve been experimenting with a similar approach using Playwright, and the biggest takeaway for me was how much “hidden API” most modern websites actually have.
Once you start mapping interactions → network calls, a lot of UI complexity just disappears. It almost feels like the browser becomes a reverse-engineering tool for undocumented APIs.
That said, I do think there’s a tradeoff people don’t talk about enough:
- Sites change frequently, so these inferred APIs can be brittle
- Auth/session handling gets messy fast
- And of course, the ToS / ethical side is a gray area
Still, for personal automation or internal tooling, it’s insanely powerful. Way more efficient than driving full browser sessions for everything.
Curious how others are handling stability — are you just regenerating these mappings periodically, or building some abstraction layer on top?
Very cool. I do something like this but with Playwright. It used to be a real token hog though, and got expensive fast. So much so that I built a wrapper to dump results to disk first then let the agent query instead. https://uisnap.dev/
Will check this out to see if they’ve solved the token burn problem.
I use playwright CLI. Wrote a skill for it, and after a bit of tuning it's about 1-2k context per interaction which is fine. The key was that Claude only needs screenshots initially and then can query the dev tools for logs as needed.
my workaround for this was to make a wrapper mcp server which uses claude haiku to summarize the page snapshot returned in the response of each playwright mcp call, and that has worked pretty well for me: https://github.com/jsdf/playwright-slim-mcp
I asked Claude to use this with the new scheduled tasks /loop skill to update my Oscar picks site every five minutes during tonight’s awards show. It simply visited the Oscars' realtime feed via Chrome DevTools, and updated my picks and pushed to gh pages. It even handled the tie correctly.
i wish more people knew or cared about web standards vs proprietary protocols. the webdriver bidi protocol took the good parts of cdp and made it a w3c standard, but no one knows about it. some of the people who do know about it, find one thing they don't like and give up. let's not keep giving megacorporations outsized influence and control over the web and the tools we use with it. let's celebrate standards and make them awesome.
Great to see the standalone CLI shipping in alongside this! There’s been a lot of talk today about MCP 'context bloat,' but providing a direct bridge to active DevTools sessions is something a standard headless CLI can’t replicate easily. The ability to select an element in the Elements panel and immediately 'delegate' the fix to an agent is exactly the kind of hybrid workflow that makes DevTools so powerful.
For something like Chrome DevTools MCP with authenticated browser sessions, the specific risk is credentials in the browser context + any SEND capability reachable from the same entry points. If a page can inject a prompt that triggers a tool call, and that call path can also reach outbound network I/O, you have an exfiltration vector without needing shell access at all.
I've been using the DevTools MCP for months now, but it's extremely token heavy. Is there an alternative that provides the same amount of detail when it comes to reading back network requests?
It's probably not fully optimized and could be compacted more with just some effort, and further with clever techniques, but browser state/session data will always use up a ton of tokens because it's a ton of data. There's not really a way around that. AI's have a surprising "intuition" about problems that often help them guess at solutions based on insufficient information (and they guess correctly more often than I expect they should). But when their intuition isn't enough and you need to feed them the real logs/data...it's always gonna use a bunch of tokens.
This is one place where human intuition helps a ton today. If you can find the most relevant snippets and give the AI just the right context, it does a much better job.
i'm experimenting with a different approach (no CDP/ARIA trees, just Chrome extension messaging that returns a numbered list of interactive elements).
Way lighter on tokens and undetectable but still very experimental : https://github.com/DimitriBouriez/navagent-mcp
CLI is great when you know what command to run. MCP is great when the agent decides what to run - it discovers tools without you scripting the interaction.
The real problem isn't MCP vs CLI, it's that MCP originally loaded every tool definition into context upfront. A typical multi-server setup (GitHub, Slack, Sentry, Grafana, Splunk) consumes ~55K tokens in definitions before Claude does any work. Tool selection accuracy also degrades past 30-50 tools.
Anthropic's Tool Search fixes this with per-tool lazy loading - tools are defined with defer_loading: true, Claude only sees a search index, and full schemas load on demand for the 3-5 tools actually needed. 85% token reduction. The original "everything upfront" design was wrong, but the protocol is catching up.
The thing I am working on is improving at the moment agentic tool usage success rates for my research and I use this as a proxy to access everything with the cookies I allow in the session.
I don’t do any serious web development and haven’t for 25 years aside from recently vibe coding internal web admin portals for back end cloud + app dev projects. But I did recently have to implement a web crawler for a customer’s site for a RAG project using Chromium + Playwrite in a Docker container deployed to Lambda.
I ran the Docker container locally for testing. Could a web developer test using Claude + Chromium in a Docker container without using their real Chrome instance?
Yes, running Chromium in a Docker container works well for this. There are prebuilt images like https://hub.docker.com/r/browserless/chrome that give you a headless instance you can connect to via CDP (Playwright, Puppeteer). Keeps everything isolated from your actual browser profile and credentials.
Unfortunately there are like a billion competitors to this right now (including Playwright MCP, Playwright CLI, the new baked-in Playwright feature in Codex /experimental, Claude Code for Chrome...) and I can never quite decide if or when I should try to switch. I'm still just using the ordinary Playwright MCP server in both Codex and Claude Code, for the time being.
I would use whatever you are comfortable with, I wanted a similar tool so I coded my own. Smaller API so that understand what is going on and it is easy not to get lost
Interesting. MCP APIs can be useful for humans too.
Chrome's dev tools already had an API [1], but perhaps the new MCP one is more user friendly, as one main requirement of MCP APIs is to be understood and used correctly by current gen AI agents.
I built something in this space, bb-browser (https://github.com/epiral/bb-browser). Same CDP connection, but the approach is honestly kind of cheating.
Instead of giving agents browser primitives like snapshot, click, fill, I wrapped websites into CLI commands. It connects via CDP to a managed Chrome where you're already logged in, then runs small JS functions that call the site's own internal APIs. No headless browser, no stolen cookies, no API keys. Your browser is already the best place for fetch to happen. It has all the cookies, sessions, auth state. Traditional crawlers spend so much effort on login flows, CSRF tokens, CAPTCHAs, anti-bot detection... all of that just disappears when you fetch from inside the browser itself. Frontend engineers would probably hate me for this because it's really hard to defend against.
So instead of snapshot the DOM (easily 50K+ tokens), find element, click, snapshot again, parse... you just run
bb-browser site twitter/feed
and get structured JSON back.
Here's the thing I keep thinking about though. Operating websites through raw CDP is a genuinely hard problem. A model needs to understand page structure, find the right elements, handle dynamic loading, deal with SPAs. That takes a SOTA model. But calling a CLI command? Any model can do that. So the SOTA model only needs to run once, to write the adapter. After that, even a small open-source model runs "bb-browser site reddit/hot" just fine.
And not everyone even needs to write adapters themselves. I created a community repo, bb-sites (https://github.com/epiral/bb-sites), where people freely contribute adapters for different websites. So in a sense, someone with just an open-source model can already feel the real impact of agents in their daily workflow. Agents shouldn't be a privilege only for people who can access SOTA models and afford the token costs.
There's a guide command baked in so if you do want to add a new site, you can tell your agent "turn this website into a CLI" and it reverse-engineers the site's APIs and writes the adapter.
v0.8.x dropped the Chrome extension entirely. Pure CDP, managed Chrome instance. "npm install -g bb-browser" and it works.
imo a much better setup is using playwright-cli + some skill.md files for profiling (for example, I have a skill using aidenybai/react-scan for frontend react profiling). token efficient, fast and more customizable/upgradable based on your workflow. vercel-labs/agent-browser is also a good alternative.
Been using MCP tooling heavily for a few months and browser debugging integration is one of those things that sounds gimmicky until you actually try it. The real question is whether it handles flaky async state reliably or just hallucinates what it thinks the DOM looks like?
Connecting a remote VPS to a local Chrome session is usually a headache. It gets complicated when your Claw setup is on the server but the browser session stays on your own machine. I ended up using Proxybase’s relay [0] to bridge the gap, and it actually solved the connection issues for me.
One tip for the illegal scrapers or automators out there. Casperjs and phanthomjs are still working very well for anti bot detection. These are very old libs no longer maintained. But I can even scrape and authenticate at my banks.
This is the exact problem that pushed me to build a security proxy for MCP tool calls. The permission model in most MCP setups is basically binary, either the agent can use the tool or it can't. There's nothing watching what it does with that access once its granted.
The approach I landed on was a deterministic enforcement pipeline that sits between the agent and the MCP server, so every tool call gets checked for things like SSRF (DNS resolve + private IP blocking), credential leakage in outbound params, and path traversal, before the call hits the real server. No LLM in that path, just pattern matching and policy rules, so it adds single-digit ms overhead.
The DevTools case is interesting because the attack surface is the page content itself. A crafted page could inject tool calls via prompt injection. Having the proxy there means even if the agent gets tricked, the exfiltration attempt gets caught at the egress layer.
The ultimate conflict of interest here is that the sites people want to crawl the most are the ones that want to be crawled by machines the least (e.g. Youtube). So people will end up emulating genuine human users one way or another.
Fully agree. Will take some time though as immediate incentive not clear for consumer facing companies to do extra work to help ppl bypass website layer. But I think consumers will begin to demand it, once they experience it through their agent. Eg pizza company A exposes an api alongside website and pizza company B doesn’t, and consumer notices their agent is 10x+ faster interacting with company A and begins to question why.
> The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces.
To me, sites that "only have human interfaces" are more likely that not be that way totally on purpose, attempting to maximize human retention/engagement and are more likely to require strict anti-bot measures like Proof-of-Work to be usable at all.
I feel like the fact tha HTML is end result is exactly why the Web is so successful. Yes, structured APIs sound great, until you realize the API owners will never give you the data you actually want via their APIs. This is why HTML has done so well. Why extensions exist. And why it's better for browser automation.
> What we actually need is a standard for websites to expose a machine-readable interaction layer alongside the human one.
We had this 20 years ago with the Semantic Web movement, XHTML, and microformats. Sadly, it didn't pan out for various reasons, most of them non-technical. There's remnants of it today with RSS feeds, which is either unsupported or badly supported by most web sites.
Once advertising became the dominant business model on the web, it wasn't in publishers' interest to provide a machine-readable format of their content. Adtech corporations took control of the web, and here we are. Nowadays even API access is tightly controlled (see Reddit, Twitter, etc.).
So your idea will never pan out in practice. We'll have to continue to rely on hacks and scraping will continue to be a gray area. These new tools make automated scraping easier, for better or worse, but publishers will find new ways to mitigate it. And so it goes.
Besides, if these new tools are "superintelligent", surely they're able to navigate a web site. Captchas are broken and bot detection algorithms (or "AI" themselves) are unreliable. So I'd say the leverage is on the consumer side, for now.
I use Playwright to intercept all requests and responses and have Claude Code navigate to a website like YouTube and click and interact with all the elements and inputs while recording all the requests and responses associated with each interaction. Then it creates a detailed strongly typed API to interact with any website using the underlying API.
Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
I do this via BrowserOS -- https://github.com/browseros-ai/BrowserOS
It has an in-built MCP server and I use it with claude code, codex and like it quite a lot.
I also do this. My primary use case is for reproducing page layout and styling at any given tree in the dom. So, capturing various states of a component etc.
I also use it to automatically retrieve page responsiveness behavior in complex web apps. It uses playwright to adjust the width and monitor entire trees for exact changes which it writes structured data that includes the complete cascade of styles relevant with screenshots to support the snapshots.
There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans. So, lots of clickety-clackety and human speed results.
---
My first reaction to seeing this FP was why are people still releasing MCPs? So far I've managed to completely avoid that hype loop and went straight to building custom CLIs even before skills were a thing.
I think people are still not realizing the power and efficiency of direct access to things you want and skills to guide the AI in using the access effectively.
Maybe I'm missing something in this particular use case?
> My first reaction to seeing this FP was why are people still releasing MCPs?
MCPs are more difficult to use. You need to use an agent to use the tools, can't do it manually easily. I wonder if some people see that friction as a feature.
its mostly because MCPs handle auth in a standardised way and give you a framework you can layer things like auth, etc on top of.
Without it youre stuck with the basic http firewall, etc which is extremely dangerous and this is maybe the 1 opportunity we have to do this.
1 reply →
I do something similar [1] but it leverages WebMCP (see Amazon example [2]). Could probably turn it into a strongly typed API.
[1] https://github.com/sidwyn/webmcp-tool-library
[2] https://github.com/sidwyn/webmcp-tool-library/blob/main/cont...
I love how HN is loving this idea when it's the exact same thing Anthropic and OpenAi (and every other llm maker) did.
It's God's gift to them when it lets them bypass ads and dl copyrighted material. But it's Satan's curse on humanity when the Zuck does it to train his llm and dl copyrighted material.
I think there's a little bit of the Goomba fallacy at play here to be fair
Both scale and purpose make them completely different things. You're acting as if they're the same when they're not.
I won't comment about dl but ads are trackers and spyware for me. I don't spy on websites' owners, I have my human rights to stop those trackers.
Zuck serves ads/spywares to other users, he deserves to taste his own medicines, not me.
Yes, it's a god's gift when the average user can do it, and satan's curse what a hated fucking mega-corp is doing it.
Where's the contradiction?
You can see this pattern in many different topics: updoots are highly correlated with a positive answer to "do I personally get to profit"?
1 reply →
I would love to pay for content. I'm _paying_ for YouTube Premium.
But heck. Do I hate the YouTube interface, it degraded far past usability.
1 reply →
So you’re that Hal Jordan then? Why would a Green Lantern feel the need to defend either? I feel like the Guardians would not accept your arguments as soon as you got to Oa, poozer. I guess what I am saying is don’t have a famous name. Seems obvious.
2 replies →
You conflate web crawling for inference with web crawling for training.
Web crawling for training is when you ingest content on a mass scale, usually indiscriminately, usually with a dumb crawler for scale's sake, for the purposes of training an LLM. You don't really care whether one particular website is in the dataset (unless it's the size of Reddit), you just want a large, diverse, high-quality data mix.
Web crawling for inference is when a user asks a targeted question, you do a web search, and fetch exactly those resources that are likely to be relevant to that search. Nothing ends up in the training data, it's just context enrichment.
People have a much larger issue with crawling for training than for inference (though I personally think both are equally ok).
Why even use Playwright for this? I feel like Claude just needs agent-browser and it can generate deterministic code from it.
you mean this one? https://github.com/vercel-labs/agent-browser
6 replies →
You can just start claude with the —chrome flag too and it will connect to the chrome extension.
yes please! i need a "comment to follow" functionality on HN
Please do.
Did you compare playwright with mcp? Why one over another?
I use MCP usually, because I heard it’s less detectable than playwright, and more robust against design changes, but I didn’t compare/test myself
Very interested. Would even pay for an api for this. I am doing something similar with vibium and need something more token efficient.
have you tried vibium's cli + agent skill?
I use chrome devtools MCP to the same end - it works great for me. Interested in what advantages you see in using Playwright over chrome devtools?
Would this hypothetically be able to download arbitrary videos from youtube without the constant yt-dlp arms race?
Don’t know how this could be more stable than ytdlp. When issues come up they’re fixed really quickly.
6 replies →
> yt-dlp arms race
I don't know anything about yt-dlp.
It would probably help people who want to go to a concert and have a chance to beat the scalpers cornering the market on an event in 30 seconds hitting the marketplace services with 20,000 requests.
I can try to see if can bypass yt-dlp. But that is always a cat and mouse game.
2 replies →
If it can save all the video/audio fragment and call ffmpeg to join them together. Maybe?
Yes, please do and ping me when it's done lol. Did you make it into an agent skill?
Exactly, it is an agent skill that interacts pressing buttons and stuff with a webpage capturing and documenting all the API requests the page makes using Playwright's request / response interception methods. It creates and strongly typed well documented API at the end.
1 reply →
I would like to see this!
I just ask Claude to reverse engineer the site with Chrome MCP. It goes to work by itself, uses your Chrome logged in session cookies, etc.
I would love it if you had time to publish it!
I was doing similar by capturing XHR requests while clicking through manually, then asking codex to reverse engineer the API from the export.
Never tried that level of autonomy though. How long is your iteration cycle?
If I had to guess, mine was maybe 10-20 minutes over a few prompts.
I assume you're not logged into those sites, in order to avoid bans and the risk of hitting the wrong button like, say, "Delete Account".
It turns any authenticated browser session into a fully typed REST API proxy — exposing discovered endpoints as local Hono routes that relay requests through the browser, so cookies and auth are automatic.
The point is that it creates an API proxy in code that a Typescript server calls directly. The AI runs for about 10 minutes with codegen. The rest of the time it is just API calls to a service. Remove the endpoint for "Delete Account" and that API endpoint never gets called.
This 100% breaks everyone's terms of service. I would not recommend nor encourage using.
I always used playwrite as an alternative to selenium, relatively surprised by its ability to interface with LLMs.
+1, publish, but how will we know when you have published...
Yes, please do!
100% I'll response to this by Friday with link to Github.
I use Patchright + Ghostery and I have a cleaver tool that uses web sockets to pass 1 second interval screenshots to the a dashboard and pointer / keyboard events to the server which allow interacting with websites so that a user can create authentication that is stored in the chrome user profile with all the cookies, history, local storage, ect.. in the cloud on a server.
Can you list some websites that don't require subscription that you would like to me to test against? I used this for Robinhood and I think Linked in would be a good example for people to use.
2 replies →
Id like to see this published as well thx!
Please do!
Please publish!
Commenting to follow up.
Wow. Yes please.
isnt it what everyone that needs web validation does?
[dead]
The DevTools MCP project just recently landed a standalone CLI: https://github.com/ChromeDevTools/chrome-devtools-mcp/blob/m...
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
Love the Mitch Hedberg reference! Thank you! Always good to get a little Mitch!
‘I don’t have a girlfriend. But I do know a woman who’d be mad at me for saying that.’
‘I’m against picketing, but I don’t know how to show it.’
‘I haven’t slept for ten days, because that would be too long.’
‘I like to play blackjack. I’m not addicted to gambling. I’m addicted to sitting in a semi-circle.’
"I was going to get my teeth whitened but then I said, fuck that, I'll just get a tan instead."
It doesn't seem to work, tried the -u flag with the default address and it just couldn't connect to the existing chrome instance.
The big upside of the MCP is that it connects to already open browser windows. I tried the skill but it always tries to open new windows. Is there a way to get the `--autoConnect` behaviour with the CLI?
Woah got this for the first time.
> Too many requests
You have exceeded a secondary rate limit.
Please wait a few minutes before you try again; in some cases this may take up to an hour. Signing in may provide a higher rate limit if you are not already signed in.
For more on scraping GitHub and how it may affect your rights, please review our Terms of Service.
MCPs cost nothing in CC now with Tool Search.
> MCPs cost nothing in CC now with Tool Search.
This is incorrect. Plenty of people have run the numbers. Tool search does not fix all problems with MCP.
2 replies →
Codex also has this…
Someone already made a great agent skill for this, which I'm using daily, and it's been very cool!
https://github.com/pasky/chrome-cdp-skill
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
On one hand, cool demo, on the other, this is horrifying in more ways than I can begin to describe. You're literally one prompt injection away from someone having unlimited access to all of your everything.
Not the person you're replying to, but: I just use a separate, dedicated Chrome profile that isn't logged into anything except what I'm working on. Then I keep the persistence, but without commingling in a way that dramatically increases the risk.
edit: upon rereading, I now realize the (different) prompt injection risk you were calling out re: the handoff to yt-dlp. Separate profiles won't save you from that, though there are other approaches.
4 replies →
Of course I still watch it and have my finger on the escape key at all times :)
6 replies →
As long as it’s gated and not turned on by default, it’s all good. They could also add a warning/sanity check similar to “allow pasting” in the console.
1 reply →
> Most browser automation tools launch a fresh, isolated browser. This one connects to the Chrome you're already running
Is this the same as what Claude in Chrome does?
I tried that for a while and since I use Firefox and Chromium, the security problem of it seeing your tabs wasn't a big deal. Fresh Chrome install, only ever used for this exact purpose. Plus you can watch it working in real (actually very slow) time so if you did point it at something risky you can take over at any point.
For actual testing of web apps though, a skill with playwright cli in headless mode is much more effective. About 1-2k context per interaction after a bit of tuning.
To be clear, this isn't a skill for the devtools mcp, but an independent project. It doesn't look bad, but obviously browser automation + agents is a very busy space with lots of parallel efforts.
DevTools MCP and its new CLI are maintained by the team behind Chrome DevTools & Puppeteer and it certainly has a more comprehensive feature set. I'd expect it to be more reliable, but.. hey open source competition breeds innovation and I love that. :)
(I used to work on the DevTools team. And I still do, too)
Does anyone really use these hacked up with duct tape skills? why not use something more reliable like playwriter.dev?
Mhh, yt-dlp already has a build in youtube search, could you not use that instead of anything with AI?
Google is so far behind agentic cli coding. Gemini CLI is awful. So bad in fact that it’s clear none of their team use it. Also MCP is very obviously dead, as any of us doing heavy agentic coding know. Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in. Playwright with headless Chromium or headed chrome is what anyone serious is using and we get all the dev and inspection tools already. And it works perfectly. This only has appeal to those starting out and confused into thinking this is the way. The answer is almost never MCP.
> Also MCP is very obviously dead, as any of us doing heavy agentic coding know.
As someone that does heavy agentic coding (using basically all the tools), this is so far from the truth. People claiming this have probably never worked in large enterprise environments where things like authentication, RBAC, rate limiting, abuse detection, centralized management/updates/ops, etc. are a huge part of the development and deployment workflow.
In these situations you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity. MCP is really useful here, and allows centralized eng and ops teams to manage their services in a way that aligns with the organizations overall posture, policies, and infrastructure.
> Google is so far behind agentic cli coding. Gemini CLI is awful.
This part I totally agree. It's really hard to express how bad it is (and it's really disappointing.)
> you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity
You're describing MCP. After all, MCP is just reinventing the OpenAPI wheel. You can just have a self-documenting REST API using OpenAPI. Put the spec in your context and your model knows how to use it. You can have all the RBAC and rate limiting and auth you want. Heck, you could even build all that complexity into a CLI tool if you want. MCP the protocol doesn't actually enable anything. And implementing an MCP server is exactly as complex as using any other established protocol if you're using all those features anyway
2 replies →
Given MCP is supposed to just be a standardised format for self-describing APIs, why are all the features you listed MCP related things? It sounds more like it's forced the enterprise to build such features which cli tooling didn't have?
2 replies →
FYI: Gemini Cli is used internally at Google. It's actually more popular than Antigravity. Google uses MCP services internally for code search (since everything is in a mono-repo you don't want to waste time grepping billions of files), accessing docs and bugs, and also accessing project specific RAG databases for expertise grounding.
Source - I know people at Google.
> Also MCP is very obviously dead
Some people will push back on this. They are holding out hope that the recent improvements Anthropic has made in this regard have improved the context rot problem with MCP. Anthropic's changes improve things a little. But it is akin to putting lipstick on a pig. It helps, but not much.
The reason MCP is dying/dead is because MCP servers, once configured, bloat up context even when they are not being used. Why would anybody want that?
Use agent skills. And say goodbye to MCP. We need to move on from MCP.
Is your agent harness dropping the entire MCP server tool description output directly into the context window? Is your agent harness always addig MCP servers to the context even when they are not being used?
MCP is a wire format protocol between clients and servers. What ends up inside the context window is the agent builder's decision.
I'm a layman here. How is a skill any better? Aren't agent tools loaded on-demand, just as a skill would be? People are mentioning OpenAPI, but wouldn't you need to load the spec for that too?
> it is akin to putting lipstick on a pig. It helps, but not much.
The lipstick helps? This had me in stitches. Sorry for the non-additive reply. This is the funniest way I have seen this or any other phrase explained. By far. Honestly has made my day and set me up for the whole week.
i am using notion mcp. is there a corresponding skill. also wtf is a plugin.
The bloat problem is already out dated though. People are having the LLM pick the MCP servers it needs for a particular task up front, or picking them out-of-band, so the full list doesn't exist in the context every call.
MCP is dead? Which cli tool should we use to instruct Chrome to open a page and click the Open button? And to read what appears in the console after clicking?
MCP permanently sacrifice a chunk of the context window? And a skill for you cli is free?
MCP is very much not dead. centralized remote MCP servers are incredibly useful. also bespoke CLIs still require guidance for models to use effectively, so it's clear that token efficiency is still an issue regardless.
Tbh I find self-documenting CLIs (e.g. with a `--help` flag, and printing correct usage examples when LLMs make things up) plus a skill that's auto invoked to be pretty reliable. CLIs can do OAuth dances too just fine.
MCP's remaining moats I think are:
- No-install product integrations (just paste in mcp config into app)
- Non-developer end users / no shell needed (no terminal)
- Multi-tenant auth (many users, dynamic OAuth)
- Security sandboxing (restrict what agents can do), credential sandboxing (agents never see secrets)
- Compliance/audit (structured logs, schema enforcement)?
If you're a developer building for developers though, CLI seems to be a clear winner right
6 replies →
I see remote MCP servers as a great interface to consume api responses. The idea that you essentially make your apis easily available to agents to bring in relevant context is a powerful one.
When folks say MCP is dead, I don't get it. What other alternatives exist in place of MCP? Arbitrary code via curl/sdks to call a remote endpoint?
3 replies →
I think cli’s are more token efficient- the help menu is loaded only when needed, and the output is trivially pipe able to grep or jq to filter out what the model actually wants
all you need is a simple skills.md and maybe a couple examples and codex picks up my custom toolkit and uses it.
2 replies →
I don't know if this just anecdotal random impression, but in a last week or two I had mostly good experience with Google cli. While previously I constantly complained about it. I have been using it together with codex, and I would not say that one is much better than another.
It is hard to say nowadays, when things change so quickly
Gemini 3.1 Pro through Gemini CLI always tries to write files with cat instead of using the write_file tool, it's awful at tool use.
I know it’s a bit of a tangent but man you’re right re. Gemini CLI. It’s woefully bad, barely works. Maybe because I was a “free” user trying it out at the time, but it was such a bad experience it turned me off subscribing to whatever their coding plan is called today.
I had this exp too, but I trialed the pro sub a few weeks back and it has been great. I have no complaints this time
it's not the CLI, it's the model. The model wasn't trained to do that kind of work, was trained to do one shot coding, not sustained back and forth until it gets it right like Claude and ChatGPT.
> Also MCP is very obviously dead...
Couldn't have been more wrong. MCP despite its manageable downsides is leagues ahead of anything else in many ways.
The fact that SoTA models are trained to handle MCP should be hint enough to the observant.
I probably build one MCP tool per week at work.
And every project I work on gets its own MCP tool too. It's invaluable to have specialized per-project tooling instead of a bunch of heterogeneous scripts+glue+prayer.
Anything specialized goes into an MCP.
Antigravity's coding agent is worlds apart from Gemini CLI, though.
> So bad in fact that it’s clear none of their team use it.
I use it extensively, many of my colleagues do. I get a ton of value out of it. Some prefer Antigravity, but I prefer Gemini CLI. I get fairly long trajectories out of it, and some of my colleagues are getting day-long trajectories out of it. It has improved massively since I started using it when it first came out.
some serious people use vibium instead. (full-disclosure: "some serious people" is me.)
MCP is not just used for coding.
> Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in
What about all the CLI tools not baked into the model's priors?
Every time someone says "extensibility mechanism X is dead!", I think "Well, I guess that guy isn't doing anything that needs to extend the statistical average of 2010s-era Reddit"
Been using this one for a while, mostly with codex on opencode. It's more reliable and token efficient than other devtools protocol MCPs i've tried.
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
I found this one working amazingly well (same idea - connect to existing session): https://github.com/remorses/playwriter
How does this compare with playwright CLI?
https://github.com/microsoft/playwright-cli
I personally found playwright-cli, and agent-browser which wraps playwright, both more token-efficient than using the raw mcp.
Odd that this article from Dec 2025 has been posted to the top of HN though
Its easier to connect to existing sessions in your main browser.
It’s made by Google and comes with Chrome
Is this really the state of AI in 2026?
It takes over your entire browser to center a div... and then fails to do so?
Lots of MCP hate, and some love, in the comments.
80% of MCPs are thin wrappers over APIs . Yes they stink.
A well written remote OAuth MCP need not stink. Tons of advantages starting with strong security baked in.
I like Cloudflare Code Mode as an MCP pattern. Two tools, search and execute.
1M Opus 4.6 also reduces the penalties of MCP’s context approach. Along with tool search etc.
I've been using TideWave[1] for the last few months and it has this built-in. It started off as an Elixir/LiveView thing but now they support popular JavaScript frameworks and RoR as well. For those who like this, check it out. It even takes it further and has access to the runtime of your app (not just the browser).
The agent basically is living inside your running app with access to databases, endpoints etc. It's awesome.
1. https://tidewave.ai/
Interesting. Does it only work with known frameworks like Next, React etc. or could I use it with my plain Node.js app which produces browser-output?
No, doesn't use work with server-side only apps.
2 replies →
We tested this — the default take_snapshot path (Accessibility.getFullAXTree) is safe. It filters display:none elements because they're excluded from the accessibility tree.
But evaluate_script is the escape hatch. If an agent runs document.body.textContent instead of using the AX tree, hidden injections in display:none divs show up in the output. innerText is safe (respects CSS visibility), textContent is not (returns all text nodes regardless of styling).
The gap: the agent decides which extraction method to use, not the user. When the AX tree doesn't return enough text, a plausible next step is evaluate_script with textContent — which is even shown as an example in the docs.
Also worth noting: opacity:0 and font-size:0 bypass even the safe defaults. The AX tree includes those because the elements are technically 'rendered' and accessible to screen readers. display:none is just the most common hiding technique, not the only one.
I’ve been experimenting with a similar approach using Playwright, and the biggest takeaway for me was how much “hidden API” most modern websites actually have.
Once you start mapping interactions → network calls, a lot of UI complexity just disappears. It almost feels like the browser becomes a reverse-engineering tool for undocumented APIs.
That said, I do think there’s a tradeoff people don’t talk about enough:
- Sites change frequently, so these inferred APIs can be brittle - Auth/session handling gets messy fast - And of course, the ToS / ethical side is a gray area
Still, for personal automation or internal tooling, it’s insanely powerful. Way more efficient than driving full browser sessions for everything.
Curious how others are handling stability — are you just regenerating these mappings periodically, or building some abstraction layer on top?
I can't make it run under WSL with Claude Code, anyone succeeded in this?
Very cool. I do something like this but with Playwright. It used to be a real token hog though, and got expensive fast. So much so that I built a wrapper to dump results to disk first then let the agent query instead. https://uisnap.dev/
Will check this out to see if they’ve solved the token burn problem.
I use playwright CLI. Wrote a skill for it, and after a bit of tuning it's about 1-2k context per interaction which is fine. The key was that Claude only needs screenshots initially and then can query the dev tools for logs as needed.
Mostly, yes: https://github.com/microsoft/playwright-cli
my workaround for this was to make a wrapper mcp server which uses claude haiku to summarize the page snapshot returned in the response of each playwright mcp call, and that has worked pretty well for me: https://github.com/jsdf/playwright-slim-mcp
I asked Claude to use this with the new scheduled tasks /loop skill to update my Oscar picks site every five minutes during tonight’s awards show. It simply visited the Oscars' realtime feed via Chrome DevTools, and updated my picks and pushed to gh pages. It even handled the tie correctly.
https://danielraffel.me/2026/03/16/my-oscar-2026-picks/
I know I could just use claude --chrome, but I’m used to this excellent MCP server.
Very cool idea and site! I wish claude and others could parse video streams then you could even create your own feed.
Neat idea :)
I had fun playing with it + WebMCP this weekend, but I think, similarly to how claude code / codex + MCP require SKILL.md, websites might too.
We could put them in a dedicated tag:
For all the skills with you want on the page, optionally set to default which "should be read in full to properly use the page".
And then add some javascript functions to wrap it / simplify required tokens.
Made a repo and a website if anyone is interested: https://webagentskills.dev/
i wish more people knew or cared about web standards vs proprietary protocols. the webdriver bidi protocol took the good parts of cdp and made it a w3c standard, but no one knows about it. some of the people who do know about it, find one thing they don't like and give up. let's not keep giving megacorporations outsized influence and control over the web and the tools we use with it. let's celebrate standards and make them awesome.
I found Firefox with https://github.com/padenot/firefox-devtools-mcp to work better then the default Chrome MCP, is seems much faster.
Great to see the standalone CLI shipping in alongside this! There’s been a lot of talk today about MCP 'context bloat,' but providing a direct bridge to active DevTools sessions is something a standard headless CLI can’t replicate easily. The ability to select an element in the Elements panel and immediately 'delegate' the fix to an agent is exactly the kind of hybrid workflow that makes DevTools so powerful.
For something like Chrome DevTools MCP with authenticated browser sessions, the specific risk is credentials in the browser context + any SEND capability reachable from the same entry points. If a page can inject a prompt that triggers a tool call, and that call path can also reach outbound network I/O, you have an exfiltration vector without needing shell access at all.
Also works nicely together with agent-browser (https://github.com/vercel-labs/agent-browser) using --auto-connect
I've been using the DevTools MCP for months now, but it's extremely token heavy. Is there an alternative that provides the same amount of detail when it comes to reading back network requests?
It's probably not fully optimized and could be compacted more with just some effort, and further with clever techniques, but browser state/session data will always use up a ton of tokens because it's a ton of data. There's not really a way around that. AI's have a surprising "intuition" about problems that often help them guess at solutions based on insufficient information (and they guess correctly more often than I expect they should). But when their intuition isn't enough and you need to feed them the real logs/data...it's always gonna use a bunch of tokens.
This is one place where human intuition helps a ton today. If you can find the most relevant snippets and give the AI just the right context, it does a much better job.
https://github.com/microsoft/playwright-cli and https://agent-browser.dev/
i'm experimenting with a different approach (no CDP/ARIA trees, just Chrome extension messaging that returns a numbered list of interactive elements). Way lighter on tokens and undetectable but still very experimental : https://github.com/DimitriBouriez/navagent-mcp
Yes. CLI. Always CLI. Never MCP. Ever. You’re welcome.
CLI is great when you know what command to run. MCP is great when the agent decides what to run - it discovers tools without you scripting the interaction.
The real problem isn't MCP vs CLI, it's that MCP originally loaded every tool definition into context upfront. A typical multi-server setup (GitHub, Slack, Sentry, Grafana, Splunk) consumes ~55K tokens in definitions before Claude does any work. Tool selection accuracy also degrades past 30-50 tools.
Anthropic's Tool Search fixes this with per-tool lazy loading - tools are defined with defer_loading: true, Claude only sees a search index, and full schemas load on demand for the 3-5 tools actually needed. 85% token reduction. The original "everything upfront" design was wrong, but the protocol is catching up.
That doesn't solve the issue here because the amount of data in the browser state dwarfs the MCP overhead.
2 replies →
I made a websocket proxy + chrome extension to give control of the DOM to agents for my middleware app: https://github.com/RALaBarge/browserbox
The thing I am working on is improving at the moment agentic tool usage success rates for my research and I use this as a proxy to access everything with the cookies I allow in the session.
I don’t do any serious web development and haven’t for 25 years aside from recently vibe coding internal web admin portals for back end cloud + app dev projects. But I did recently have to implement a web crawler for a customer’s site for a RAG project using Chromium + Playwrite in a Docker container deployed to Lambda.
I ran the Docker container locally for testing. Could a web developer test using Claude + Chromium in a Docker container without using their real Chrome instance?
Yes, running Chromium in a Docker container works well for this. There are prebuilt images like https://hub.docker.com/r/browserless/chrome that give you a headless instance you can connect to via CDP (Playwright, Puppeteer). Keeps everything isolated from your actual browser profile and credentials.
Take a look at https://news.ycombinator.com/item?id=47207790
I suggest to use https://github.com/simonw/rodney instead
Unfortunately there are like a billion competitors to this right now (including Playwright MCP, Playwright CLI, the new baked-in Playwright feature in Codex /experimental, Claude Code for Chrome...) and I can never quite decide if or when I should try to switch. I'm still just using the ordinary Playwright MCP server in both Codex and Claude Code, for the time being.
I would use whatever you are comfortable with, I wanted a similar tool so I coded my own. Smaller API so that understand what is going on and it is easy not to get lost
https://news.ycombinator.com/item?id=47207790
Interesting. MCP APIs can be useful for humans too.
Chrome's dev tools already had an API [1], but perhaps the new MCP one is more user friendly, as one main requirement of MCP APIs is to be understood and used correctly by current gen AI agents.
[1]: https://chromedevtools.github.io/devtools-protocol/
I built something in this space, bb-browser (https://github.com/epiral/bb-browser). Same CDP connection, but the approach is honestly kind of cheating.
Instead of giving agents browser primitives like snapshot, click, fill, I wrapped websites into CLI commands. It connects via CDP to a managed Chrome where you're already logged in, then runs small JS functions that call the site's own internal APIs. No headless browser, no stolen cookies, no API keys. Your browser is already the best place for fetch to happen. It has all the cookies, sessions, auth state. Traditional crawlers spend so much effort on login flows, CSRF tokens, CAPTCHAs, anti-bot detection... all of that just disappears when you fetch from inside the browser itself. Frontend engineers would probably hate me for this because it's really hard to defend against.
So instead of snapshot the DOM (easily 50K+ tokens), find element, click, snapshot again, parse... you just run
and get structured JSON back.
Here's the thing I keep thinking about though. Operating websites through raw CDP is a genuinely hard problem. A model needs to understand page structure, find the right elements, handle dynamic loading, deal with SPAs. That takes a SOTA model. But calling a CLI command? Any model can do that. So the SOTA model only needs to run once, to write the adapter. After that, even a small open-source model runs "bb-browser site reddit/hot" just fine.
And not everyone even needs to write adapters themselves. I created a community repo, bb-sites (https://github.com/epiral/bb-sites), where people freely contribute adapters for different websites. So in a sense, someone with just an open-source model can already feel the real impact of agents in their daily workflow. Agents shouldn't be a privilege only for people who can access SOTA models and afford the token costs.
There's a guide command baked in so if you do want to add a new site, you can tell your agent "turn this website into a CLI" and it reverse-engineers the site's APIs and writes the adapter.
v0.8.x dropped the Chrome extension entirely. Pure CDP, managed Chrome instance. "npm install -g bb-browser" and it works.
I wrote an ai agent that do chrome testing, yes, chrome MCP do work https://github.com/netdur/hugind/tree/main/agent/chrome_test...
imo a much better setup is using playwright-cli + some skill.md files for profiling (for example, I have a skill using aidenybai/react-scan for frontend react profiling). token efficient, fast and more customizable/upgradable based on your workflow. vercel-labs/agent-browser is also a good alternative.
Been using MCP tooling heavily for a few months and browser debugging integration is one of those things that sounds gimmicky until you actually try it. The real question is whether it handles flaky async state reliably or just hallucinates what it thinks the DOM looks like?
Note that this is a mega token guzzler in case you’re paying for your own tokens!
My approach is a thin cli wrapper instead.
https://news.ycombinator.com/item?id=47207790
I tell Claude to use playwright so I don't even need to do the setup myself.
Similarly, cursor has a built in browser and visit localhost to see the results in the browser. Although I don't use it much (I probably should).
I have been using Playwright for a fairly long time now. Do checkout
For context extraction, Lightpanda is a really great option. Much faster than Chrome, and it comes with a built-in MCP server.
However, it will not fill forms, etc. But it can be combined with agent-browser to get the best of both worlds: https://swival.dev/pages/web-browsing.html
I love how in their demo video where they center an element it ends up off-center.
chrome-cli with remote developer port has been working fine this entire time.
Now that there's widespread direct connectivity between agents and browser sessions, are CAPTCHAs even relevant anymore?
so good openclaw automation extensions, i like it.
so good browser automation extensions. i like it
Connecting a remote VPS to a local Chrome session is usually a headache. It gets complicated when your Claw setup is on the server but the browser session stays on your own machine. I ended up using Proxybase’s relay [0] to bridge the gap, and it actually solved the connection issues for me.
[0] https://relay.proxybase.xyz
One tip for the illegal scrapers or automators out there. Casperjs and phanthomjs are still working very well for anti bot detection. These are very old libs no longer maintained. But I can even scrape and authenticate at my banks.
Was already eye rolling about the headline. Then I realized it's from chrome.
Hoping from some good stories from open claw users that permanently run debug sessions.
[dead]
[dead]
[dead]
[dead]
that is the MCP vs CLI debate.
[dead]
[flagged]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
It's from 2025. The post should have a year tag.
Done, thanks!
[flagged]
This is the exact problem that pushed me to build a security proxy for MCP tool calls. The permission model in most MCP setups is basically binary, either the agent can use the tool or it can't. There's nothing watching what it does with that access once its granted.
The approach I landed on was a deterministic enforcement pipeline that sits between the agent and the MCP server, so every tool call gets checked for things like SSRF (DNS resolve + private IP blocking), credential leakage in outbound params, and path traversal, before the call hits the real server. No LLM in that path, just pattern matching and policy rules, so it adds single-digit ms overhead.
The DevTools case is interesting because the attack surface is the page content itself. A crafted page could inject tool calls via prompt injection. Having the proxy there means even if the agent gets tricked, the exfiltration attempt gets caught at the egress layer.
Someone left their bot on default settings.
The other reply to this 'bot' looks like another default thing: <https://news.ycombinator.com/threads?id=David-Brug-Ai>
[flagged]
AI
Yes. Can someone tell me why even HN has bots. For selling upvotes to advertisement purposes?
1 reply →
[flagged]
The ultimate conflict of interest here is that the sites people want to crawl the most are the ones that want to be crawled by machines the least (e.g. Youtube). So people will end up emulating genuine human users one way or another.
Fully agree. Will take some time though as immediate incentive not clear for consumer facing companies to do extra work to help ppl bypass website layer. But I think consumers will begin to demand it, once they experience it through their agent. Eg pizza company A exposes an api alongside website and pizza company B doesn’t, and consumer notices their agent is 10x+ faster interacting with company A and begins to question why.
Is this just a well-documented API?
They’re trying to solve it by making it easier to get Markdown versions of websites.
For example, you can get a markdown out of most OpenAI documentation by appending .md like this: https://developers.openai.com/api/docs/libraries.md
Not definitive, but still useful.
> interface designed for humans — the DOM.
Citation needed.
> The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces.
To me, sites that "only have human interfaces" are more likely that not be that way totally on purpose, attempting to maximize human retention/engagement and are more likely to require strict anti-bot measures like Proof-of-Work to be usable at all.
I feel like the fact tha HTML is end result is exactly why the Web is so successful. Yes, structured APIs sound great, until you realize the API owners will never give you the data you actually want via their APIs. This is why HTML has done so well. Why extensions exist. And why it's better for browser automation.
> What we actually need is a standard for websites to expose a machine-readable interaction layer alongside the human one.
We had this 20 years ago with the Semantic Web movement, XHTML, and microformats. Sadly, it didn't pan out for various reasons, most of them non-technical. There's remnants of it today with RSS feeds, which is either unsupported or badly supported by most web sites.
Once advertising became the dominant business model on the web, it wasn't in publishers' interest to provide a machine-readable format of their content. Adtech corporations took control of the web, and here we are. Nowadays even API access is tightly controlled (see Reddit, Twitter, etc.).
So your idea will never pan out in practice. We'll have to continue to rely on hacks and scraping will continue to be a gray area. These new tools make automated scraping easier, for better or worse, but publishers will find new ways to mitigate it. And so it goes.
Besides, if these new tools are "superintelligent", surely they're able to navigate a web site. Captchas are broken and bot detection algorithms (or "AI" themselves) are unreliable. So I'd say the leverage is on the consumer side, for now.
> expose a machine-readable interaction layer alongside the human one
Which is called ARIA and has been a thing forever.