Comment by sandblast2
17 hours ago
The expertise in software engineering typical in these promptfondling companies shine through this blog post.
Surely they know 100% code coverage is not a magical bullet because the code flow and the behavior can differ depending on the input. Just because you found a few examples which happen to hit every line of code you didn't hit every possible combination. You are living in a fool's paradise which is not a surprise because only fools believe in LLMs. You are looking for a formal proof of the codebase which of course no one does because the costs would be astronomical (and LLMs are useless for it which is not at all unique because they are useless for everything software related but they are particularly unusable for this).
So, what is the solution? Senior engineer looks over PR and signs LGTM? That is just "vibe testing". The worst kind of testing. I think the author is right, setting up tests to form a reactive environment for coding agents will lead us to a new golden age. If you later find some issue with your test case coverage, you expand it. But it is good to do it from the start as throroughtly as possible.
It's a bold claim that LLMs are useless for formal verification when people have been hooking them up to proof assistants for a while. I think that it's probably not a terrible idea; the LLM might make some mistakes in the spec but 99% of the time there are a lot of irrelevant details that it will do a serviceable job with.