Comment by dataviz1000

4 days ago

What I have so far needs a lot of work and is flaky. Everyday it is getting tighter and better.

Microsoft pulled out the lifecycle management code from Puppeteer and put it into Playwright with Google's copyright still at the top of the several files. They both use CDP. I'm using the Chrome extension analogue for every CDP message and listener. I need a couple days to remove all the code from the Page, Frame, FrameManager, and Browser classes and methodically make a state machine with it to track lifecycle and race conditions. It is a huge task and I don't want to share it without accomplishing that.

For example, there is a system that listens for all navigation requests in a Page's / Tab's Frames in Playwright. Embedded frames can navigate to urls which the parent Frame is still loading such as advertising resources, all that needs to be tracked.

There are a lot of companies that are talking about building solutions using CDP without Playwright and I'm curious how well they are going to handle the lifecycle management. Maybe if they don't intercept requests and responses it is very straight forward and simple.

One idea I have is just evaluate '1+1' in the frame's content script in a loop with a backoff strategy and if it returns 2 then continue with code execution or if it times out fail instead of tracking hundreds of navigations with with 30 different embedded frames in a page like CNN. I'm still tinkering. Stagehand calls Locator.evaluate() which is what I'm building because I haven't implemented it yet.

4 comments

dataviz1000

nikisweeting 4 days ago

Yes the key is we don't intercept requests and responses, that saves 60% of the headache of lifecycle management.

We do exactly what you described with a 1+1 check in a loop for every target, it pops any crashed sessions from the pool, and we don't keep any state beyond that about what tabs are alive. We really try to derive everything fresh from the browser on every call, with minimal in-memory state.

https://github.com/browser-use/browser-use/blob/2a0f4bd93a43...

dataviz1000 4 days ago

Ha, I got that idea from you! Sitting there in the back of my mind.

sandGorgon 3 days ago

very cool! software never gets done ....so would have loved to see it (and contributed to it). But totally respect that. Would love to see it when ur done!

for context, i contribute to an opensource mobile browser (built on top of chromium), where im actually building out the extension system. Chrome doesnt have extensions on mobile! would have loved to see if this works on android...with android's lifecycle on top of your own !!!

dataviz1000 3 days ago

> so would have loved to see it (and contributed to it)
Hopefully, I can get it to a point where developers look at it and want to contribute. I'm pretty disciplined writing clean organized code even if it is hacking of PoC, on the other hand, all the thousand tests that run when pressing a button in the UI are created by AI and overall are a mess.
With the code, the biggest problem is lifecycle management (tracking all the Windows, Pages, Frames, and embedded Frames), however, it is only 4 files and can be solved with a thought out state chart. There are event listeners attached to Frames that aren't being removed under certain conditions. If I run the tests, they will work. If I start clicking links, switching tabs, ect., the extension will fail requiring a reload.
> would have loved to see if this works on android
It is dependent on chrome.* APIs like chrome.tabs.*. If I had to summarize Puppeteer / Playwright, they do two things, track Page / Frame lifecycle and evaluate JavaScript expressions using "Runtime.evaluate", mostly the latter because it gives access to the all the DOM APIs, i.e. () => { return window.document.location }.
I don't know if Android Chrome has similar functionality or if you are able to build it. Nonetheless, if you have a way to evaluate code inside the content script world, either the MAIN or ISOLATED, you might only need a limited set of features to manage and track Pages, Frames, Tabs, ect.. If your interest is browser automation you might not need a lot of devtools features or can later add them.