← Back to context

Comment by nikisweeting

4 days ago

Yeah I started building this in my first week at the company haha: https://github.com/browser-use/desktop

Shipping a whole electron app is not a priority at the moment though, our revenue comes from cloud API users, and there we only need our custom chrome fork, no point messing with electron and extension bridges when we can add custom CDP commands to talk to `chrome.*` APIs directly.

I like the Chrome fork idea. I imagine in the next couple years, hardware companies, i.e. Apple, Lenovo, will start to ship extremely power local inference hardware as the models become sufficient which your browser will be able to leverage.

  • I built a prototype using native messaging (the same way apps password managers interact with browsers and drive actions with pure js).

    I have a lot of actions done but not full there yet. Essentially the goal is to use a cli or an external desktop app to drive your already logged‑in Chrome profile without navigator.webdriver, or enabling --remote‑debugging‑port. In all my testing never got flagged with captcha/bot protect. The cli can interact with LLMs, local file system(despite opfs this is easier).

    CLI(host app) <-> Native messaging daemon <-> Chrome extension

    Extenion just executes pure js for extraction, navigation. Ive gotten basic google searching, image downloading working. Looking at complex form interactions.

    • native messaging is a huge headache to set up reliably across all the OSs and (often headless) chrome setups in our experience, that's why we've avoided it.

      Just using some remote endpoint message bus service is an easier solution, or something like ElectricSQL/RxDB/Replicache/etc.

      We also can't really use in-page JS for much because it's easily detected by bot-blockers, though isolated worlds help in CDP.