Comment by alkonaut

12 hours ago

This seems it should be very easy to validate. Force the AI to make minimal changes to the code under test, which makes a single (or as few as possible) test fail as a result. If it can't make a test fail at all, it should be useless.

5 comments

alkonaut

jihadjihad 12 hours ago

Agreed, and that's why I think adding some example prompts and ideas to the Testing section would be helpful. A vanilla-prompted LLM, in my experience, is very unreliable at adding tests that fail when the changes are reverted.

Many times I've observed that the tests added by the model simply pass as part of the changes, but still pass even when those changes are no longer applied.

simonw 12 hours ago
I had an example in that section but it got picked apart by pedants (who had good points) so I removed it. I plan to add another soon. You can still see it in the changelog: https://simonwillison.net/guides/agentic-engineering-pattern...
- sn9 2 hours ago
  
  Matt Pocock has a nice TDD skill he's made available [0][1].
  [0] https://www.aihero.dev/skill-test-driven-development-claude-...
  [1] https://github.com/mattpocock/skills/blob/main/tdd/SKILL.md

ndriscoll 12 hours ago

This is essentially dual to the idea behind mutation testing, and should be trivial to do with a mutation testing framework in place (track whether a given test catches mutants, or more sophisticated: whether it catches the exact same mutants as some other test).

simonw 12 hours ago

That's part of the reason I like red/green TDD - you make the agent show that the test fails before the implementation and passes afterwards.

It can still cheat, but it's less likely to cheat.