Comment by aidenn0

1 month ago

How do you verify that the AI translation to Lean is a correct formalization of the problem? In other fields, generative AI is very good at making up plausible sounding lies, so I'm wondering how likely that is for this usage.

74 comments

aidenn0

sibrahim 1 month ago

That's what's covered by the "assuming you have formalized the statement correctly" parenthetical.

Given a formal statement of what you want, Lean can validate that the steps in a (tedious) machine-readable purported proof are valid and imply the result from accepted axioms. This is not AI, but a tiny, well reviewed kernel that only accepts correct formal logic arguments.

So, if you have a formal statement that you've verified to represent what you are interested in by some other means, Lean can tell you whether the proof created by genAI is correct. Basically, there is a nigh infallible checker that won't accept incorrect hallucinations.

mjevans 1 month ago
I think the question is, how can humans have verification that the problem statement was correctly encoded into that Lean specification?
- ndriscoll 1 month ago
  
  The problem statement is apparently
  > Let C>0 and ϵ>0 be sufficiently small. Are there infinitely many integers a,b,n with a≥ϵn and b≥ϵn such that a!b!∣n!(a+b−n)! and a+b>n+Clogn?
  Which seems like it's the type of thing you give as a homework problem to state formally in an intro class.
  
  20 replies →
- johncolanduoni 1 month ago
  
  They can read the statement, and the definitions that the statement references. If everything it references is in a well-tread part of the Lean library, you can have pretty high confidence in a few minutes of going over the syntax.
- saghm 1 month ago
  
  Isn't that kind of a general problem with proofs, even when they're written by humans? There's nothing stopping someone from accidentally writing their own Lean proof that has slightly different semantics than an English version of the same proof, or even for their English proof to subtly miss something important or make an incorrect logical leap. This seems like a bit of a double standard, although maybe there's nuance here I'm missing.
  
  1 reply →
- digikata 1 month ago
  
  To borrow some definitions from Systems engineering for verification and validation, this question is one of validation. Verification is performed by Lean and spec syntax and logic enforcement. But Validation is a question of is if the Lean spec encodes a true representation of the problem statement (was the right thing specced). Validation at highest levels is probably an irreplaceable human activity.
  Also, on the verification side - there could also be a window of failure that Lean itself has a hidden bug in it too. And with automated systems that seek correctness, it is slightly elevated that some missed crack of a bug becomes exploited in the dev-check-dev loop run by the AI.
- mrtesthah 1 month ago
  
  They probably need to be able to read and understand the lean language.
  
  7 replies →
- threatofrain 1 month ago
  
  It’s far easier for Lean because the human has to read very little compared to generating whole programs.
- luckydata 1 month ago
  
  the same way you verify that any other program compiles? I don't understand the question tbh, it seems self evident.
  
  1 reply →
oh_my_goodness 1 month ago
>That's what's covered by the "assuming you have formalized the statement correctly" parenthetical.
Sure. But it's fair to ask how to validate that assumption.
- fsmv 1 month ago
  
  Skilled humans must understand the problem and write the theorem statement.
f1shy 1 month ago

> This is not AI,
A little bit nitpicking, but according to books like AIMA that is indeed AI. In the first chapter even any control system is classified as AI.
Because of the reasons stated in the 1st chapter, I totally agree with the authors.
The whole system is AI. That part is a verifier in a chain of “suggestions/instict -> verifier” like used in neurosymbolic systems for automated driving, for example.
exe34 1 month ago
the argument here is that:
1. you write a proof in English that there is an infinite number of primes. 2. the llm writes 2+2=4 in lean. 3. lean confirms that this is correct and it's impossible that this proof is wrong.
- NetMageSCW 1 month ago
  
  You missed a whole section - a person creates a Lean formalization of #1 and Lean promptly says the AI proof is wrong because it doesn’t prove that formal problem statement.
  The question is in the person (or AI) creating the formal problem statement - how do you know it represents the problem the proof is supposed to be for? And the answer is for people in the field, in this case, formalizing the problem and verifying the formalization is easy. It is like generating an public key versus factoring it.
  
  1 reply →
heroprotagonist 1 month ago
Soo, it can definitively tell you that 42 is correct Answer to the Ultimate Question of Life, The Universe, and Everything. It just can't tell you if you're asking the right question.
- Someone 1 month ago
  
  No, it can tell you that 42 is the answer to (some lean statement), but not what question that lean statement encodes.
- davidwritesbugs 1 month ago
  
  I recently learnt that Douglas Adams wrote code and the ultimate answer was '*' - 42 in ascii
bee_rider 1 month ago

How hard is it to go back to English, from Lean? Just as hard as going from English to Lean?
If it is easier to convert backwards, maybe the AI can at least describe what the equations mean…
rurban 1 month ago
Lean is doing logical AI, the classical AI part.
Aristotle is doing the matching AI part, the modern LLM approach, previously called fuzzy logic.
Both are AI.
- teiferer 1 month ago
  
  Calling Lean "AI" is quite a stretch. Though I'm also in the camp that dislikes the inflationary use of "AI" for LLMs, so I have sympathies for your viewpoint.
  
  3 replies →

svat 1 month ago

It may help to look at this example concretely:

The natural-language statement of the problem is (from https://www.erdosproblems.com/728):

> Let C>0 and ϵ>0 be sufficiently small. Are there infinitely many integers a,b,n with a≥ϵn and b≥ϵn such that a!b!∣n!(a+b−n)! and a+b>n+Clogn?

The Lean-language statement of the problem (which can be done either by hand or by AI) is (from https://github.com/plby/lean-proofs/blob/f44d8c0e433ab285541...):

    ∀ᶠ ε : ℝ in [>] 0, ∀ C > (0 : ℝ), ∀ C' > C,
      ∃ a b n : ℕ,
        0 < n ∧
        ε * n < a ∧
        ε * n < b ∧
        a ! * b ! ∣ n ! * (a + b - n)! ∧
        a + b > n + C * log n ∧
        a + b < n + C' * log n

Yes on the one hand, one needs to know enough about Lean to be sure that this formulation matches what we intend, and isn't stating something trivial. But on the other hand, this is not as hard as finding an error on some obscure line of a long proof.

(There's also an older formulation at https://github.com/google-deepmind/formal-conjectures/blob/f... but the new one is more in the spirit of what was intended: see the discussion starting at https://www.erdosproblems.com/forum/thread/728#post-2196 which gives a clear picture, as of course does Tao's thread in the OP that summarizes this discussion.)

kovek 1 month ago
I'm wondering how do people come up with these mathematical challenges?
- mikhailfranco 1 month ago
  
  Live out of a suitcase, travel the world, hang out with a wide selection of excellent mathematicians, write joint papers with many of them, when you get bored or stuck, pack the suitcase and keep moving - for your whole life.

maxwells-daemon 1 month ago

For this reason, when we announce results on e.g. the IMO, we formalize the statements by hand and inspect the proofs carefully to ensure they capture the full spirit of the problem.

However, there are some good heuristics. If you expect a problem to be hard and the proof is very short, you've probably missed something!

tomjen3 1 month ago
Aren't the best proofs very short?
- j_maffe 1 month ago
  
  "Best" is subjective. Proofs can get very long especially for the harder ones mathematicians are tackling these days.
bytesandbits 1 month ago
no you don't
https://www.reddit.com/r/singularity/comments/1pv3nl3/commen...
- minitech 1 month ago
  
  That doesn’t look like a counterexample to “we formalize the statements by hand and inspect the proofs carefully to ensure they capture the full spirit of the problem”.

maccam912 1 month ago

To answer the question a different way, I think you are asking how we know the proof actually matches the description the human provided? And I'd say we can't know for sure, but the idea is that you can pretty concisely write and check yourself that the problem is accurate, i.e. "There are an infinite number of primes" or whatever, and then even if an LLM goes off and makes up a lean proof wildly different from your description, if lean says the proof is valid then you have proven the original statement. I guess in theory the actual proof could be way different than what you thought it would be, but ultimately all the logic will still check out.

rtpg 1 month ago

I feel like even outside of AI translation, formalization not capturing the spirit of what the informal description was provided is always a risk.

This is also a big risk when trying to prove code correctness: "prove this algo works" means you gotta define "works" along certain axes, and if you're very unlucky you might have a proof that exploits the uncertainty around a certain axis.

jlouis 1 month ago

The statement is something you provide. It's the search you can have the LLM do. If this works for math it will immediately make code way higher quality via the same tools.

JacobAsmuth 1 month ago

You read it yourself :)

If you and the AI agree on the translation of the problem, and lean agrees with the solution, then you're done.

roenxi 1 month ago

You're looking for the practical answer, but philosophically it isn't possible to translate an informal statement into a formal one 'correctly'. It is informal, ie, vaguely specified. The only certain questions are if the formal axioms and results are interesting which is independent of the informal formalisation and that can only be established by inspecting the the proof independently of the informal spec.

wizzwizz4 1 month ago
Philosophically, this is not true in general, but that's for trivial reasons: "how many integers greater than 7 are blue?" doesn't correspond to a formal question. It is absolutely true in many specific cases. Most problems posed by a mathematician will correspond to exactly one formal proposition, within the context of a given formal system. This problem is unusual, in that it was originally misspecified.
- red75prime 1 month ago
  
  I suppose there's no formally defined procedure that accepts a natural language statement and outputs either its formalization or "misspecified". And "absolutely true" means "the vast majority of mathematicians agree that there's only one formal proposition that corresponds to this statement".
  
  5 replies →