We would've preferred to build this as browser extension too.
But we strongly believe that for building a good agent co-pilot we need bunch of changes at Chromium C++ code level. For example, chromium has a accessibility tree for every website, but doesn't expose it as an API to chrome extension. Having access to accessibility tree would greatly improve agent execution.
We are also building bunch of changes in C++ for agents to interact with websites -- functions like click, elements with indexes. You can inject JS for doing this but it is 20-40X slower.
How is that accessibility tree different from the “accessibility snapshot” that you can get from Playwright for example?
I was tackling a similar problem few weeks ago and I found that playwright MCP was the most usable solution in my case. It doesn’t use an extension but it debugs the browser tabs (I guess using dev tools protocol) but I agree the experience was suboptimal
We don't mind upstreaming. But I don't think Google Chrome/Chromium wants to expose it as an API chrome extensions, if not they would've done this long time ago.
From Google's perspective, extension are meant to be lightweight applications, with restricted access.
We had this exact thought as well, you don't need a whole browser to implement the agentic capabilities, you can implement the whole thing with the limited permissions of a browser extension.
There are plenty of zero day exploit patches that Google immediately rolls out and not to mention all the other features that Google doesn't push to Chromium. I wouldn't trust a random open source project for my day-to-day browser.
Check out rtrvr.ai for a working implementation, we are an AI Web Agent browser extension that meets you where your workflows already are.
Brave Browser (70M+ users) has validated that a chromium fork can be viable path. And it can in fact provide better privacy and security.
Chrome extensions is not a bad idea too. Just saying that owning the underlying source code has some strong advantages in the long term (being able to use C++ for a11y tree, DOM handling, etc -- which will be 20-40X faster than injecting JS using chrome extension).
I personally talked to another agentic browser player, fellou.ai, in the space asking them how they are keeping up with all the Chromium pushes as you need a dedicated team to handle the merges, they flat out told me they are targeting tech enthusiasts that are not interested in the security of their browser as much.
As an ex-Google engineer I know the immense engineering efforts and infrastructure setup to develop Chrome. It is very implausible that two people can handle all the effort to serve a secure browser with 15+ million lines of constantly changing C++ code.
A sandboxxed browser extension is the natural form factor for these agentic capabilities.
I mean, I have no skin in the game but I mean, there are people who are using Dia (browser company) and Dia is closed source so it would be nice to see those people jumping to browser OS atleast.
I personally would prefer it as an extension but there are some limitations as the author of browserOS noted within extensions but I just wish that google/chromium can push those changes upstream I guess.
We would've preferred to build this as browser extension too.
But we strongly believe that for building a good agent co-pilot we need bunch of changes at Chromium C++ code level. For example, chromium has a accessibility tree for every website, but doesn't expose it as an API to chrome extension. Having access to accessibility tree would greatly improve agent execution.
We are also building bunch of changes in C++ for agents to interact with websites -- functions like click, elements with indexes. You can inject JS for doing this but it is 20-40X slower.
How is that accessibility tree different from the “accessibility snapshot” that you can get from Playwright for example?
I was tackling a similar problem few weeks ago and I found that playwright MCP was the most usable solution in my case. It doesn’t use an extension but it debugs the browser tabs (I guess using dev tools protocol) but I agree the experience was suboptimal
Could you upstream that change in order to make it an extension in the future? I think people would not value it any less.
We don't mind upstreaming. But I don't think Google Chrome/Chromium wants to expose it as an API chrome extensions, if not they would've done this long time ago.
From Google's perspective, extension are meant to be lightweight applications, with restricted access.
6 replies →
Would this be possible for Firefox?
IIRC, Firefox's web extension API does not provide access to accessibility tree as well.
5 replies →
I mean you could build the agent with a first principles understanding of the DOM instead of just hacking together with the accessibility tree
We had this exact thought as well, you don't need a whole browser to implement the agentic capabilities, you can implement the whole thing with the limited permissions of a browser extension.
There are plenty of zero day exploit patches that Google immediately rolls out and not to mention all the other features that Google doesn't push to Chromium. I wouldn't trust a random open source project for my day-to-day browser.
Check out rtrvr.ai for a working implementation, we are an AI Web Agent browser extension that meets you where your workflows already are.
Brave Browser (70M+ users) has validated that a chromium fork can be viable path. And it can in fact provide better privacy and security.
Chrome extensions is not a bad idea too. Just saying that owning the underlying source code has some strong advantages in the long term (being able to use C++ for a11y tree, DOM handling, etc -- which will be 20-40X faster than injecting JS using chrome extension).
Honestly excited to see the benchmark result and comparison!
Our benchmark results [https://www.rtrvr.ai/blog/web-bench-results] show that we are 7x faster than browser-use so curious to see if your claims live up to the hype
> I wouldn't trust a random open source project for my day-to-day browser.
Given that you're working on a direct competitor, this comment reads as fearmongering, designed to drive people over to your product.
I personally talked to another agentic browser player, fellou.ai, in the space asking them how they are keeping up with all the Chromium pushes as you need a dedicated team to handle the merges, they flat out told me they are targeting tech enthusiasts that are not interested in the security of their browser as much.
As an ex-Google engineer I know the immense engineering efforts and infrastructure setup to develop Chrome. It is very implausible that two people can handle all the effort to serve a secure browser with 15+ million lines of constantly changing C++ code.
A sandboxxed browser extension is the natural form factor for these agentic capabilities.
1 reply →
Conflict of interests at its heart.
I mean, I have no skin in the game but I mean, there are people who are using Dia (browser company) and Dia is closed source so it would be nice to see those people jumping to browser OS atleast.
I personally would prefer it as an extension but there are some limitations as the author of browserOS noted within extensions but I just wish that google/chromium can push those changes upstream I guess.
1 reply →
Exactly my thoughts when I saw nanobrowser being mentioned here.
try https://github.com/nanobrowser/nanobrowser