Comment by sibrahim
2 days ago
That's what's covered by the "assuming you have formalized the statement correctly" parenthetical.
Given a formal statement of what you want, Lean can validate that the steps in a (tedious) machine-readable purported proof are valid and imply the result from accepted axioms. This is not AI, but a tiny, well reviewed kernel that only accepts correct formal logic arguments.
So, if you have a formal statement that you've verified to represent what you are interested in by some other means, Lean can tell you whether the proof created by genAI is correct. Basically, there is a nigh infallible checker that won't accept incorrect hallucinations.
I think the question is, how can humans have verification that the problem statement was correctly encoded into that Lean specification?
The problem statement is apparently
> Let C>0 and ϵ>0 be sufficiently small. Are there infinitely many integers a,b,n with a≥ϵn and b≥ϵn such that a!b!∣n!(a+b−n)! and a+b>n+Clogn?
Which seems like it's the type of thing you give as a homework problem to state formally in an intro class.
Yeah people dramatically overestimate the difficulty of getting one's definitions correct for most problems, especially when you are doing an end to end proof rather than just axiomatizing some system. They are still worth looking at carefully, especially for AI-generated proofs where you don't get the immediate feedback that you do as a human when something you expect to be hard goes through easily, but contrary to what seems to be popular belief here they are generally much easier to verify than the corresponding proof (in the case of formally verified software, the corresponding analogy is verifying that the spec is what you want vs. verifying that the program matches the spec; the former is generally much easier).
Everyone has a different perspective, based on their math background. From the OP's perspective, the formalization of this problem statement was apparently worth talking about. On the other hand, for you it's just a homework problem that belongs in an intro class.
Let's just be generous and try to accept these differences.
1 reply →
Are you an expert? Not gatekeeping here but I have no intuition for what is easy or hard to formalise. A lot of very simply stated graph theoretical results are apparently extremely hard to formalise.
15 replies →
They can read the statement, and the definitions that the statement references. If everything it references is in a well-tread part of the Lean library, you can have pretty high confidence in a few minutes of going over the syntax.
Isn't that kind of a general problem with proofs, even when they're written by humans? There's nothing stopping someone from accidentally writing their own Lean proof that has slightly different semantics than an English version of the same proof, or even for their English proof to subtly miss something important or make an incorrect logical leap. This seems like a bit of a double standard, although maybe there's nuance here I'm missing.
One nuance you are missing is that the discussion is about formalizing the statement (the theorem), not the proof. The latter is what the article is about, but that doesn't suffice if you can't trust that the statement is also correctly formalized.
These are the two main problems:
1. Formalizing a theorem.
2. Finding a formal proof.
Part 2 is where AI could help as proof search is full of heuristics. That's also how humans find proofs and is one of the main skills of a mathematician. The formal proof can then be machine checked with well known and mature techniques not involving AI.
Part 1 is the part that's missing and will always be hard. It's also the issue with formal verification of programs for which correctness criteria are often very complex and it's easy to mess up the formalization, so that even if you trust the proof, you can't trust that it proves the right thing.
To borrow some definitions from Systems engineering for verification and validation, this question is one of validation. Verification is performed by Lean and spec syntax and logic enforcement. But Validation is a question of is if the Lean spec encodes a true representation of the problem statement (was the right thing specced). Validation at highest levels is probably an irreplaceable human activity.
Also, on the verification side - there could also be a window of failure that Lean itself has a hidden bug in it too. And with automated systems that seek correctness, it is slightly elevated that some missed crack of a bug becomes exploited in the dev-check-dev loop run by the AI.
It’s far easier for Lean because the human has to read very little compared to generating whole programs.
They probably need to be able to read and understand the lean language.
I can read and understand e.g. Python, but I have seen subtle bugs that were hard to spot in code generated by AI. At least the last time I tried coding agents (mid 2025), it was often easier to write the code myself then play "spot the bug" with whatever was generated. I don't know anything about Lean, so I was wondering if there were similar pitfalls here.
6 replies →
the same way you verify that any other program compiles? I don't understand the question tbh, it seems self evident.
Compiling isn’t sufficient because it doesn’t tell you if the program matches the specification. A program that always says the temperature is 80 F will compile but is a terrible solution to what is the temperature outside at this location right now.
>That's what's covered by the "assuming you have formalized the statement correctly" parenthetical.
Sure. But it's fair to ask how to validate that assumption.
Skilled humans must understand the problem and write the theorem statement.
> This is not AI,
A little bit nitpicking, but according to books like AIMA that is indeed AI. In the first chapter even any control system is classified as AI.
Because of the reasons stated in the 1st chapter, I totally agree with the authors.
The whole system is AI. That part is a verifier in a chain of “suggestions/instict -> verifier” like used in neurosymbolic systems for automated driving, for example.
How hard is it to go back to English, from Lean? Just as hard as going from English to Lean?
If it is easier to convert backwards, maybe the AI can at least describe what the equations mean…
Soo, it can definitively tell you that 42 is correct Answer to the Ultimate Question of Life, The Universe, and Everything. It just can't tell you if you're asking the right question.
No, it can tell you that 42 is the answer to (some lean statement), but not what question that lean statement encodes.
I recently learnt that Douglas Adams wrote code and the ultimate answer was '*' - 42 in ascii
the argument here is that:
1. you write a proof in English that there is an infinite number of primes. 2. the llm writes 2+2=4 in lean. 3. lean confirms that this is correct and it's impossible that this proof is wrong.
You missed a whole section - a person creates a Lean formalization of #1 and Lean promptly says the AI proof is wrong because it doesn’t prove that formal problem statement.
The question is in the person (or AI) creating the formal problem statement - how do you know it represents the problem the proof is supposed to be for? And the answer is for people in the field, in this case, formalizing the problem and verifying the formalization is easy. It is like generating an public key versus factoring it.
I thought that's what I was trying to express between lines 1 and 2 above, but I may have failed to get it across. my understanding is that the danger is that the llm will create a proof that is correct but isn't about what the person thinks he's proving?
Lean is doing logical AI, the classical AI part.
Aristotle is doing the matching AI part, the modern LLM approach, previously called fuzzy logic.
Both are AI.
Calling Lean "AI" is quite a stretch. Though I'm also in the camp that dislikes the inflationary use of "AI" for LLMs, so I have sympathies for your viewpoint.
Finding a path in a maze was AI once.
2 replies →