Comment by adventured

2 months ago

Claude Opus 4.5 will routinely test its own code before handing it off to you, even with zero instruction to do so.

7 comments

adventured

One commercial equivalent to the project I work on, called ProTools (a DAW), has a test "harness" that took 6 people more than a year to write and takes more than a week to execute.

Last month, I made a minor change to our own code and verified that it worked (it did!). Earlier this week, I was notified of an entirely different workflow that had been broken by the change I had made. The only sort of automated testing that would have detected this would have been similar in scope and scale to the ProTools test harness, and neither an individual human nor an LLM is going to run that.

Moreover, that workflow was entirely graphically based, so unless Claude Opus 4.5 or whatever today's flavor of vibe coding LLM agent is has access to a testing system that allows it to inject mouse events into a running instance of our application (hint: it does not), there's no way it could run an effective test for this sort of code change.

I have no doubt that Claude et al. can verify that their carefully defined module does the very limited task it is supposed to do, for cases where "carefully defined" and "very limited" are appropriate. If that's the only sort of coding you do, I am sorry for your loss.

utopiah 2 months ago
> access to a testing system that allows it to inject mouse events into a running instance of our application
FWIW that's precisely what https://pptr.dev is all about. To your broader point though designing a good harness itself remains very challenging and requires to actually understand what value for user, software architecture (to e.g. bypass user interaction and test the API first), etc.
- PaulDavisThe1st 2 months ago
  
  > Puppeteer is a JavaScript library which provides a high-level API to control Chrome or Firefox
  my world is native desktop applications, not in-browser stuff.
- nineteen999 2 months ago
  
  You suggest a web testing framework as a response to someone working on a real desktop app?
  
  2 replies →
astrange 2 months ago

Claude can do that, yes.
https://platform.claude.com/docs/en/agents-and-tools/tool-us...
Although if you want to test a UI app, it's better to do it through accessibility APIs rather than actually looking at the screen and clicking.