← Back to context

Comment by spinningslate

5 hours ago

This is the biggest risk with the rejuvenated interest in formal proof. That LLMs can generate proofs is useful. Proof assistants that can check them (Lean/FStar/Isabelle/...) similarly so.

But it just moves the question to whether the theorems covered in the proof are sufficient. Underlying it all is a simple question:

Does the system meet its intended purpose?

To which the next question is:

What is the intended purpose?

Describing that is the holy grail of requirements specification. Natural language, behaviour-driven development, test-driven development and a host of other approaches attempt to bridge the gap between implicit purpose and explicit specification. Proof assistants are another tool in that box.

It's also one of the key motivators for iterative development: putting software in front of users (or their proxies) is still the primary means of validation for a large class of systems.

None of which is implied criticism of any of those approaches. Equally, none completely solves the problem. There is a risk that formal proofs, combined with proof assistants, are trumpeted as "the way" to mitigate the risk that LLM-developed apps don't perform as intended.

They might help. They can show that code is correct with respect to some specification, and that the specification is self-consistent. They cannot prove that the specification is complete with regards its intended purpose.