Comment by stuaxo

7 days ago

You get quantity of tests, but the tests are not good quality by default, at all.

I’m not sure how you can say something general about the quality of tests unless you mean by simply prompting “make tests” or similar.

Yes, I’ve experienced that those tests succeed, and the app still breaks trivially on first run.

What I mean is: you design the tests. You analyse patterns. You insist on making testable code (average code by humans isn’t, so neither is average code by LLMs unless you specify testability as a design constraint.

One way to get testable code is to mock all interfaces. This is usually expensive, but not difficult for an AI, because you can set the success criterion to be interface exactness of your mock for a series of plausible and somewhat extensive interactions.

The tests you can make with AI are as good as you can make them otherwise, you just save time doing them, which should justify making more extensive testing.