Comment by waynenilsen

16 days ago

Frontend QA is the final frontier, good luck, you are over the target.

The amount of manual QA I am currently subjected to is simultaneously infuriating and hilarious. The foundation models are up to the task but we need new abstractions and layers to correctly fix it. This will all go the way of the dodo in 12 months but it'll be useful in the meantime.

agent-browser helped a lot over playwright but doesn't completely close the gap.

It's amazing how agents like Claude Code become very much more autonomous when they have the ability to verify their work. That's part of the reason why they work much better for unit-testable work.

I think this paradigm was very visible in yesterday's blog post from Anthropic (https://www.anthropic.com/engineering/building-c-compiler) when they mentioned that giving the agents the ability to verify against GCC was the key to unlock further progress

Giving a browser to these agents is a no brainer, especially if one works in QA or develops web-based services

frontend QA is exactly where i've seen the biggest ROI with browser agents. the gap with Playwright MCP specifically is that it assumes the agent can reason about CSS selectors and DOM state, which breaks constantly on anything with dynamic rendering, client-side routing, or shadow DOM.

the right abstraction for QA is probably closer to what a manual tester actually does, describe expected behavior, let a specialized system figure out the mechanical verification steps.

but the harder unsolved problem is evaluation: how do you reliably distinguish "the agent verified the behavior" from "the agent navigated to the right page and hallucinated a success report"? visual diffing against golden screenshots helps for regression but doesn't cover semantic correctness of dynamic content.