Comment by mjevans
2 days ago
I think the question is, how can humans have verification that the problem statement was correctly encoded into that Lean specification?
2 days ago
I think the question is, how can humans have verification that the problem statement was correctly encoded into that Lean specification?
The problem statement is apparently
> Let C>0 and ϵ>0 be sufficiently small. Are there infinitely many integers a,b,n with a≥ϵn and b≥ϵn such that a!b!∣n!(a+b−n)! and a+b>n+Clogn?
Which seems like it's the type of thing you give as a homework problem to state formally in an intro class.
Yeah people dramatically overestimate the difficulty of getting one's definitions correct for most problems, especially when you are doing an end to end proof rather than just axiomatizing some system. They are still worth looking at carefully, especially for AI-generated proofs where you don't get the immediate feedback that you do as a human when something you expect to be hard goes through easily, but contrary to what seems to be popular belief here they are generally much easier to verify than the corresponding proof (in the case of formally verified software, the corresponding analogy is verifying that the spec is what you want vs. verifying that the program matches the spec; the former is generally much easier).
Everyone has a different perspective, based on their math background. From the OP's perspective, the formalization of this problem statement was apparently worth talking about. On the other hand, for you it's just a homework problem that belongs in an intro class.
Let's just be generous and try to accept these differences.
My comment was exactly about the required math background though. Anyone who's completed an intro to proof class would find that kind of statement easy to formalize because they would have had to write similar statements for homework. That provides some context: everyone who's interested in computer theorem provers probably has some experience with proofs, so formalizing that statement should be easy for them. i.e. (for this kind of problem) it's not really "the hard part" for people who are seriously working on this stuff.
Are you an expert? Not gatekeeping here but I have no intuition for what is easy or hard to formalise. A lot of very simply stated graph theoretical results are apparently extremely hard to formalise.
> Are you an expert?
I can't speak for ndriscoll, but I am a university math professor with extensive experience teaching these sorts of topics, and I agree with their comment in full.
You are right that some (other) statements are harder to formalize than they look. The Four Color Theorem from graph theory is an example. Generally speaking, discrete math, inequalities, for all/there exists, etc. are all easy to formalize. Anything involving geometry or topology is liable to be harder. For example, the Jordan curve theorem states that "any plane simple closed curve divides the plane into two regions, the interior and the exterior". As anyone who has slogged through an intro topology book knows, statements like this take more work to make precise (and still more to prove).
1 reply →
> apparently
When someone takes the time to explain undergrad-level concepts in a comment, responding with "are you an expert?" is a level of skepticism that's bordering on hostile. The person you're responding to is correct, it's rare that the theorem statement itself is particularly hard to formalize. Whatever you read likely refers to the difficulty of formalizing a proof.
5 replies →
As the sibling comment says, in my experience, the difficulty of formalizing the problem varies greatly between different areas. Probability theory is another notorious field in which modelling a problem correctly can be very subtle and difficult.
On the other hand, many problems in number theory and discrete structures tend to be rather simple to formalize. If you want to take your own look at that, I'd recommend to check out the lean games[1]. I'd say after the natural numbers game, you most likely know enough lean to write that problem down in lean with the "sufficiently small" being the trickiest part.
1: https://adam.math.hhu.de/
I think you may be confusing specification of the problem and the formalization of the proof.
Voters: please reconsider your ups and downs. I think the “Are you an expert” question triggered a lot of downvotes when it was in fact asked in good faith to judge the person’s perspective of easy and hard.
3 replies →
They can read the statement, and the definitions that the statement references. If everything it references is in a well-tread part of the Lean library, you can have pretty high confidence in a few minutes of going over the syntax.
Isn't that kind of a general problem with proofs, even when they're written by humans? There's nothing stopping someone from accidentally writing their own Lean proof that has slightly different semantics than an English version of the same proof, or even for their English proof to subtly miss something important or make an incorrect logical leap. This seems like a bit of a double standard, although maybe there's nuance here I'm missing.
One nuance you are missing is that the discussion is about formalizing the statement (the theorem), not the proof. The latter is what the article is about, but that doesn't suffice if you can't trust that the statement is also correctly formalized.
These are the two main problems:
1. Formalizing a theorem.
2. Finding a formal proof.
Part 2 is where AI could help as proof search is full of heuristics. That's also how humans find proofs and is one of the main skills of a mathematician. The formal proof can then be machine checked with well known and mature techniques not involving AI.
Part 1 is the part that's missing and will always be hard. It's also the issue with formal verification of programs for which correctness criteria are often very complex and it's easy to mess up the formalization, so that even if you trust the proof, you can't trust that it proves the right thing.
To borrow some definitions from Systems engineering for verification and validation, this question is one of validation. Verification is performed by Lean and spec syntax and logic enforcement. But Validation is a question of is if the Lean spec encodes a true representation of the problem statement (was the right thing specced). Validation at highest levels is probably an irreplaceable human activity.
Also, on the verification side - there could also be a window of failure that Lean itself has a hidden bug in it too. And with automated systems that seek correctness, it is slightly elevated that some missed crack of a bug becomes exploited in the dev-check-dev loop run by the AI.
It’s far easier for Lean because the human has to read very little compared to generating whole programs.
They probably need to be able to read and understand the lean language.
I can read and understand e.g. Python, but I have seen subtle bugs that were hard to spot in code generated by AI. At least the last time I tried coding agents (mid 2025), it was often easier to write the code myself then play "spot the bug" with whatever was generated. I don't know anything about Lean, so I was wondering if there were similar pitfalls here.
In this case the human written statement of the theorem is small. Can there be bugs? Absolutely! But it's essentially a few lines of code worth of thinking.
The lean proof checker then checks to make sure the proof actually proves the statement.
In this case an AI is generating the proof, but if it "compiles" it's correct. The only thing humans need to check is the statement to be proven.
(I don't know anything about this project but I've played around with lean and used other proof checkers more sesrisously).
1 reply →
I don't disagree with you, but on the other hand, I feel the same way about code written by other humans, and that's not because they're necessarily worse at writing code than me, but because for code I've written myself, I've already spent time thinking about it, so I don't have to start from scratch when re-reading it. It's also not like I don't think I potentially write as many bugs as my coworkers either; it's just easier for me to catch my own up front as I'm coding than it is to catch theirs in code review. The two main differences are that I can have a more meaningful conversation with my coworkers about their approach, what bugs they might think are worth looking out for, etc. compared to an LLM (which in my experience will claim completely incorrect things about the code it wrote far more often than any engineer I've worked with even junior ones; the humans I've worked with have uniformly been able to report how confident they are in what they've produced being what they were tasked with without insane exaggerations), and that an LLM can produce a much higher volume of plausible-enough looking code in a given unit of time than most humans I've worked with. It's not obvious to me that these would be particularly severe problems in generating proofs though; unless the proof is so large that it would be infeasible to read through it in a reasonable amount of time, I would expect mathematicians to be able to make up for the lower quality conversations with the "author" by devoting more time to reading and thinking, or having someone else also read through the proof and talking through it with them. If anything, it's possible that the timeline for writing up a paper about the results might be better for some mathematicians than the equivalent amount of time most engineers have to spend reviewing code before the pressure to get it merged and move on to the next thing. (I'm aware that there is certainly pressure to get things published in academia, but I don't have firsthand experience, so I've tried to be intentional in how I've worded this to clarify that I want to avoid any assumptions about what the norms would be, but given the relatively wide range of time pressure that engineers might experience across the industry as a whole, I'd expect that at least some mathematicians might have some flexibility to spend extra time reading through an LLM-written proof, especially if it might be time they'd otherwise have to spend trying to come up with the proof themselves).
If you want to check the statement, you only have to read the type. The proof itself you don’t have to read at all
As I understand it Lean is not a general purpose programming language, it is a DSL focused on formal logic verification. Bugs in a DSL are generally easier to identify and fix.
It seems one side of this argument desperately needs AI to have failed, and the other side is just saying that it probably worked but it is not as important as presented, that it is actually just a very cool working methodology going forward.
1 reply →
the same way you verify that any other program compiles? I don't understand the question tbh, it seems self evident.
Compiling isn’t sufficient because it doesn’t tell you if the program matches the specification. A program that always says the temperature is 80 F will compile but is a terrible solution to what is the temperature outside at this location right now.