Comment by marshalhq

3 days ago

I ran mutation testing on a side project recently and found a test that passed even if the production method returned an empty string. AI-generated tests at scale will have exactly this problem. High coverage, confident test names, zero actual verification.

IME there are these levels of tests:

- If you call the setter, the getter returns the same value - these are kinda bullshit and would be caught by the next level anyway

- Testing basic normal use

- Testing known difficulties of the implementation

- Exhaustive or randomized (if necessary) testing of the state space, ~= property-based testing

I expect AI to have very different levels of ability for these, not necessarily in strictly descending order as listed.