Comment by andrewmutz
1 day ago
I agree completely with the author that AI assisted coding pushes the bottleneck to verification of the code.
But you don't really need complete formal verification to get these benefits. TDD gets you a lot of them as well. Perhaps your verification is less certain, but it's much easier to get high automated test coverage than it is to get a formally verifiable codebase.
I think AI assisted coding is going to cause a resurgence of interest in XP (https://en.wikipedia.org/wiki/Extreme_programming) since AI is a great fit for two big parts of XP. AI makes it easy to write well-tested code. The "pairing" method of writing code is also a great model for interacting with an AI assistant (much better than the vibe-coding model).
There's a pretty fundamental difference: TDD is about avoiding up-front design, while formal specification literally is up-front design.
Mainstream has been slowly adapting xp like last 20 years, devops first and now pair programming with agents and tdd.
Trouble is that TDD, and formal proofs to much the same extent, assume a model of "double entry accounting". Meaning that you write both the test/proof and the implementation, and then make sure they agree. Like in accounting, the assumption is that the probability of you making the same mistake twice is fairly low, giving high confidence to accuracy when they agree. When there is a discrepancy, then you can then unpack if the problem is in the test/proof or the implementation. The fallible human can easily screw either.
But if you only fill out one side of the ledger, so to speak, an LLM will happily invent something that ensures that it is balanced, even where your side of the entry is completely wrong. So while this type of development is an improvement over blindly trusting an arbitrary prompt without any checks and balances, it doesn't really get us to truly verifying the code to the same degree we were able to achieve before. This remains an unsolved problem.
I don't fully understand what you mean by accounting expects the probability of making the same mistake twice is fairly low? Double-entry bookkeeping can only tell you if the books are balanced or not. We absolutely cannot assume that the books reflect reality just because they're balanced. You don't need to mess up twice to mess up the books in terms of truthness.
Also tests and code are independent while you always affect both sides in double-entry always. Audits exist for a reason.
With double-entry bookkeeping, the only way an error can slip through is if you make the same error on both sides, or else they wouldn’t be balanced. A similar thing is true for testing: If you make both an error in your test and in your implementation, they can cancel out and appear to be error-free.
I don’t quite agree with that reasoning, however, because a test that fails to test the property it should test for is a very different kind of error than having an error in the implementation of that property. You don’t have to make the “same” error on both sides for an error to remain unnoticed. Compared to bookkeeping, a single random error in either the tests or the implementation is more likely to remain unnoticed.
2 replies →
What about the concept of "high confidence" is not understandable?
2 replies →
That sounds right in theory, but in practice my code is far, far higher quality when I do TDD than when I don't. This applies whether or not I'm using an Ai coding assistant
I don't think GP disagrees. They are (I think) stating that AI-assisted TDD is not as reliable as human TDD, because AI will invent a pointless test just to achieve a passing outcome.