Comment by marshalhq
3 days ago
I ran mutation testing on a side project recently and found a test that passed even if the production method returned an empty string. AI-generated tests at scale will have exactly this problem. High coverage, confident test names, zero actual verification.
IME there are these levels of tests:
- If you call the setter, the getter returns the same value - these are kinda bullshit and would be caught by the next level anyway
- Testing basic normal use
- Testing known difficulties of the implementation
- Exhaustive or randomized (if necessary) testing of the state space, ~= property-based testing
I expect AI to have very different levels of ability for these, not necessarily in strictly descending order as listed.
Don't worry, AI maximalists have a solution: create tests for the tests.
That's what mutation testing is.