Comment by dataviz1000

18 hours ago

Hey guys, I got a question.

I've been working on a Chrome extension with a side panel. Think about it like the side panel copilot in VSCode, Cursor, or Windsurf. Currently it is automating workflows but those are hard coded. I've started working on a more generalized automation using langchain. Looking at your code is helpful because I can in only a few hundred lines of code recreate a huge portion Playwright's capabilities in a Chrome extension side panel so I should be able to port it to the Chrome extension. That is, I'm creating a tools like mouse click, type, mouse move, open tab, navigate, wait for element, ect..

Looking at your code, I'm thinking about pulling anything that isn't coupled to node while mapping all the Playwright capabilities to the equivalent in a Chrome extension. It's busy work.

If I do that why would I prefer using .baml over the equivalent langchain? What's the differnce? Am I'm comparing apples to oranges? I'm not worried about using langgraph because I should be able to get most of the functionality with xstate v5 [0] plus serialized portable JSON state graphs so I can store custom graphs on a remote server that can be queried by API.

That is my question. I don't see langchain in the dependencies which is cool, but why .baml? Also, what am I'm missing going down this thought path?

[0] https://chatgpt.com/share/685dfc60-106c-8004-bbd0-1ba3a33aba...

Hey, curious about your use cases for a chrome extension, care to share more?

To answer your question - BAML is as DSL that helps to define prompts, organize context, and to get better performance on structured output from the LLM. In theory you should be able to map over similar logic to other clients.

  • Chrome extension has advantage of user friendly distribution - so that non tech savy users can also do automation. I'm also looking for automation for mobile devices (app webview or safari mobile) and because of platform limitation also this doesn't seem can by anytime extended to mobile devices

  • In 2018, I helped the NFL front offices and the ticket brokers who bought wholesale in blocks of 10k manage event tickets, 100s of thousands of tickets, across secondary marketplaces, e.g. Stubhub and SeatGeek, because their primary marketplace, Ticketmaster, was very slow to develop an API that helped them import the data, barcodes, into the secondary markets and to remove the ticket from being listed if it was in a secondary market checkout or sold preventing millions of dollars worth of double sold tickets. The problem was Ticketmaster for legal reasons couldn't give us preferential access so I was always updating anytime they changed their antibot protections. I created a Chrome extension as a backup incase they blocked the automated browsers on a Friday night which was side loaded and did everything the Puppeteer agents were doing to buy me time. It was a perfect stopgap. The users would press a button and watch it automatically navigate to pages and handle their workflow in their browser moving lightening fast.

    You can do most anything you can do in Playwright, navigate, open new tabs, scroll with the added benefit of keeping the human in the loop. Conceptually they are exactly the same, I can go into that more if you want. Most of the limitations are security features. However, for automated workflows, the security features should be heeded for good reason. For example, chatgpt console require isTrusted to be true rejecting synthetic events so it is impossible to automate the chatgpt console without workarounds which they will likely close. That is the biggest limitation. On the other hand, there are 3 billion Chrome users and they can download the extension with a single click. Bypassing the security features like requiring a human interaction button press or mouse click to go fullscreen, play sound, or transfer money on a bank website shouldn't be alowed. If the use case requires that, use Playwright or a BrowserWindow in an electron application. A Chrome extension with a side panel can collect every element using stacking context that is visible to limit the amount of data processed by a LLM, it can capture all the inner text of a page, it can read every single fetch and XMLHttpRequests which is a very good way to get data without loading tons of markup, it can make fetch and XMLHttpRequests in the MAIN world so they automatically contain all the cookies, it can use huggingface/transformers.js to transcribe audio, video to text with openai whisper or perform ocr image to text on webgpu, if available.

    I can systematically analyze, poke, and prod thousands of websites running with playwright in the cloud to discover all the capabilities and automatically create workflows with xstate v5 which are sent to the Chrome extension in JSON. For example, I can automatically navigate to a website, find all the inputs, try several ways to inject text, use image to text to test if the text is added to the field to add to the list of capabilities. So if a user is on the page, I can automate the workflow or notify the user they need to take a step.

    I think the best idea is to have curated workflows and curated data embeddings to target focused industries. It can automate navigating the browser to MLS and zillow.com, collected information, inject it into google sheets office 365 excel, export it, navigate to email, write information, attach the file to the email, and send it all with the human in the loop. Moreover, if it does 95% of the work, I don't think humans will mind pressing a button or taking an action when prompted. The question is will people prefer this instead of fully automated running somewhere in the cloud? How do you feel about using a code assistant? Do you like being in the loop?

    This is all experimental. The gif has a good example of a side panel automating stock option trading. I'm going to try and inject your code to see if I can start to develop systematic generalized automation with it. [0] [1]

    [0] https://github.com/adam-s/doomberg-terminal

    [1] https://github.com/adam-s/doomberg-terminal/tree/main/docs/m...