Comment by anerli

7 months ago

Glad you were able to get it set up quickly!

We currently are optimizing for reliability and quality, which is why we suggest Claude - but it can get expensive in some cases. Using Qwen 2.5-VL-72B will be significantly cheaper, though may not be always reliable.

Most of our usage right now is for running test cases, and people seem to often prefer qwen for that use case - since typically test cases are clearer how to execute.

Something that is top of mind for is is figuring out a good way to "cache" workflows that get taken. This way you can repeat automations either with no LLM or with a smaller/cheap LLM. This will would enable deterministic, repeatable flows, that are also very affordable and fast. So even if each step on the first run is only 95% reliable - if it gets through it, it could repeat it with 100% reliability.

4 comments

anerli

TheTaytay 7 months ago

I am desperately waiting for someone to write exactly this! Use the LLM to write the repeatable, robust script. If the script fails, THEN fall back to an LLM to recover and fix the script.

pzo 7 months ago
Yes I wish we could combine browser use, stagehand, director.ai, playwright. Even better where I can record my session with mouse movements, clicks, dom inspect, screen sharing and my voice talk and explain what I want to do. Then llm generating scraper for different task and recovering if some scraping task got broken at some point.
- mertunsall 7 months ago
  
  https://github.com/browser-use/workflow-use
anerli 7 months ago

Yeah, I think its a little tricky to do this well + automatically but is essentially our goal - not necessarily literally writing a script but storing the actions taken by the LLM and being able to repeat them, and adapt only when needed