Comment by dataviz1000

6 months ago

Use an Electron app to spawn a child process to open a Chrome browser using the launch flags including `--remote-debugging-pipe` -- instead of exposing a websockets connection on port 9226 or something -- which, if coupled with `--user-data-dir=<path>`, will not show the security CDP bar warning at the top of the page as long as the user data directory is not the default user directory.

1. Get all the things you want.

2. Can create as many 'browser context' personas as you want

3. Use the Electron app renderer for UI to manage profiles, proxies for each profile, automate making gmail accounts for each profile, ect.

4. Forgot, it is very nice using the `--load-extension=/path/to/extension` flag to ship chrome extension files inside the Electron app bundle so that the launched browser will have a cool copilot side panel.

> Extensions are ok but they have limitations too, for example you cannot use extensions to automate other extensions.

5. If you know the extension ids it is easy to set up communication between the two. I already drive a Chrome extension using VSCode's core libraries and it would be a week or two of work to implement a light port of the VSCode host extension API but for a Chrome extension. Nonetheless, I'd rather have an Electron app to manage extensions the same way a VSCode does.

5 comments

dataviz1000

nikisweeting 6 months ago

Yeah I started building this in my first week at the company haha: https://github.com/browser-use/desktop

Shipping a whole electron app is not a priority at the moment though, our revenue comes from cloud API users, and there we only need our custom chrome fork, no point messing with electron and extension bridges when we can add custom CDP commands to talk to `chrome.*` APIs directly.

dataviz1000 6 months ago
I like the Chrome fork idea. I imagine in the next couple years, hardware companies, i.e. Apple, Lenovo, will start to ship extremely power local inference hardware as the models become sufficient which your browser will be able to leverage.
- Tsarp 6 months ago
  
  I built a prototype using native messaging (the same way apps password managers interact with browsers and drive actions with pure js).
  I have a lot of actions done but not full there yet. Essentially the goal is to use a cli or an external desktop app to drive your already logged‑in Chrome profile without navigator.webdriver, or enabling --remote‑debugging‑port. In all my testing never got flagged with captcha/bot protect. The cli can interact with LLMs, local file system(despite opfs this is easier).
  CLI(host app) <-> Native messaging daemon <-> Chrome extension
  Extenion just executes pure js for extraction, navigation. Ive gotten basic google searching, image downloading working. Looking at complex form interactions.
  
  1 reply →

vasusen 6 months ago

Yes an electron app helps tremendously, especially for managing lifecycle of tabs independently. We use that for creating our AI browser automations at Donobu (https://donobu.com). However, we do have the luxury of just focusing on a narrow AI QA use case vs. Browser-Use and others who need to support broad usecases in potentially adversarial environments.