Comment by justboy1987
5 hours ago
You're touching on the fundamental "who watches the watchmen" problem in formal verification. But I think the framing slightly misses the key asymmetry: reviewing a Lean4 spec is dramatically easier than reviewing the implementation it constrains.
A formal spec in Lean is typically 10-50x shorter than the code it proves correct. More importantly, Lean's type checker is itself a small, trusted kernel (~10k lines) that has been scrutinized by the PL community for years. So you're not trusting the agent — you're trusting the kernel.
The practical workflow isn't "agent writes spec + code." It's: human writes spec (the hard creative part), agent generates proof that code satisfies spec, Lean kernel mechanically checks the proof. The agent can hallucinate all it wants in step 2 — if the proof doesn't typecheck, it gets rejected deterministically.
The real bottleneck is step 1: writing good specs requires domain expertise. But that's exactly where humans should stay in the loop. It's a much better division of labor than reviewing thousands of lines of generated code.
Does that mean your production code is lean? Or do you translate some other language code to lean to verify it?