← Back to context

Comment by tiny-automates

20 days ago

frontend QA is exactly where i've seen the biggest ROI with browser agents. the gap with Playwright MCP specifically is that it assumes the agent can reason about CSS selectors and DOM state, which breaks constantly on anything with dynamic rendering, client-side routing, or shadow DOM.

the right abstraction for QA is probably closer to what a manual tester actually does, describe expected behavior, let a specialized system figure out the mechanical verification steps.

but the harder unsolved problem is evaluation: how do you reliably distinguish "the agent verified the behavior" from "the agent navigated to the right page and hallucinated a success report"? visual diffing against golden screenshots helps for regression but doesn't cover semantic correctness of dynamic content.