Comment by anerli

1 day ago

Yeah we've though about this approach a lot - but the problem is if your final program is a brittle script, you're gonna need a way to fix it again often - and then you're still depending on recurrently using LLMs/agents. So we think its better to have the program itself be resilient to change instead of you/your LLM assistant having to constantly ensure the program is working.

4 comments

anerli

adenta 20 hours ago

I wonder if a nice middle ground would be: - recording the playwright behind the scenes and storing - trying that as a “happy path” first attempt to see if it passes - if it doesn’t pass, rebuilding it with the AI and vision models

Best of both worlds. The playwright is more of a cache than a test

anerli 20 hours ago

I think the difficulty with this approach is (1) you want a good "lookup" mechanism - given a task, how do you know what cache should be loaded? you can do a simple string lookup based on the task content, but when the task might include parameters or data, or be a part of a bigger workflow, it gets trickier. (2) you need a good way to detect when to adapt / fall back to the LLM. When the cache is only a playwright script, it can be difficult to know when it falls out of the existing trajectory. You can check for selector timeouts and things, but you might be missing a lot of false negatives.

lyime 16 hours ago

Are you sure? Couldnt you just just go back to the LLM if the script breaks? Pages changes but not that often in general.

It seems like a hybrid approach would scale better and be significantly cheaper.

anerli 15 hours ago

We do believe in a hybrid approach where a fast/deterministic representation is saved - but think there is a more seamless way were the framework itself is high level and manages these details by caching the underlying actions that can run