Comment by Akranazon

1 month ago

It is interesting subject matter, I am working on something similar. But the descriptions are quite terse. Maybe I just failed to gleam:

* When you "run a WASM pass", how is that generated? Do you use an agent to do the pruning step, or is it deterministic?

* Where do the "deterministic overrides" come from? I assume they are generated by the verifier agent?

3 comments

Akranazon

tonyww 1 month ago

The WASM pass is fully deterministic: it’s just code running in the page to extract and prune post-rendered elements (roles, geometry, visibility, layout, etc), no agent involved in the chrome extension .

The “deterministic overrides” aren’t generated by a verifier agent either; they’re runtime rules that kick in when assertions or ordinality constraints are explicit (e.g. “first result”). The verifier just checks outcomes — it doesn’t invent actions. Because the nature of ai agents is non-deterministic, which we don’t want to introduce to the verification layer (predicate only).

Akranazon 1 month ago
> they’re runtime rules that kick in when assertions or ordinality constraints are explicit
So there a pre-defined list of rules - is it choosing which checks to care about from the set, or is there also a predefined binding between the task and the test?
If it's the former, then you have to ensure that the checks are sufficiently generic that there's a useful test for the given situation. Is an AI doing the choosing, over which of the checks to run?
If it's the ladder, I would assume that writing the tests would be the bottleneck, writing a test can be as flaky/time-consuming as implementing the actions by hand.
- tonyww 1 month ago
  
  It’s mostly the former: there’s a small set of generic checks/primitives, and we choose which ones to apply per step.
  The binding between “task/step” and “what to verify” can come from either:
  the user (explicit assertions), or the planner/executor proposing a post-condition (e.g. “after clicking checkout, URL contains /checkout and a checkout button exists”).
  But the verifier itself is not an AI, by design it’s predicate-only