Comment by maxwells-daemon

1 month ago

I work at Harmonic, the company behind Aristotle.

To clear up a few misconceptions:

- Aristotle uses modern AI techniques heavily, including language modeling.

- Aristotle can be guided by an informal (English) proof. If the proof is correct, Aristotle has a good chance at translating it into Lean (which is a strong vote of confidence that your English proof is solid). I believe that's what happened here.

- Once a proof is formalized into Lean (assuming you have formalized the statement correctly), there is no doubt that the proof is correct. This is the core of our approach: you can do a lot of (AI-driven) search, and once you find the answer you are certain it's correct no matter how complex the solution is.

Happy to answer any questions!

115 comments

maxwells-daemon

aidenn0 1 month ago

How do you verify that the AI translation to Lean is a correct formalization of the problem? In other fields, generative AI is very good at making up plausible sounding lies, so I'm wondering how likely that is for this usage.

sibrahim 1 month ago
That's what's covered by the "assuming you have formalized the statement correctly" parenthetical.
Given a formal statement of what you want, Lean can validate that the steps in a (tedious) machine-readable purported proof are valid and imply the result from accepted axioms. This is not AI, but a tiny, well reviewed kernel that only accepts correct formal logic arguments.
So, if you have a formal statement that you've verified to represent what you are interested in by some other means, Lean can tell you whether the proof created by genAI is correct. Basically, there is a nigh infallible checker that won't accept incorrect hallucinations.
- mjevans 1 month ago
  
  I think the question is, how can humans have verification that the problem statement was correctly encoded into that Lean specification?
  
  37 replies →
- oh_my_goodness 1 month ago
  
  >That's what's covered by the "assuming you have formalized the statement correctly" parenthetical.
  Sure. But it's fair to ask how to validate that assumption.
  
  1 reply →
- f1shy 1 month ago
  
  > This is not AI,
  A little bit nitpicking, but according to books like AIMA that is indeed AI. In the first chapter even any control system is classified as AI.
  Because of the reasons stated in the 1st chapter, I totally agree with the authors.
  The whole system is AI. That part is a verifier in a chain of “suggestions/instict -> verifier” like used in neurosymbolic systems for automated driving, for example.
- exe34 1 month ago
  
  the argument here is that:
  1. you write a proof in English that there is an infinite number of primes. 2. the llm writes 2+2=4 in lean. 3. lean confirms that this is correct and it's impossible that this proof is wrong.
  
  2 replies →
- heroprotagonist 1 month ago
  
  Soo, it can definitively tell you that 42 is correct Answer to the Ultimate Question of Life, The Universe, and Everything. It just can't tell you if you're asking the right question.
  
  2 replies →
- bee_rider 1 month ago
  
  How hard is it to go back to English, from Lean? Just as hard as going from English to Lean?
  If it is easier to convert backwards, maybe the AI can at least describe what the equations mean…
- rurban 1 month ago
  
  Lean is doing logical AI, the classical AI part.
  Aristotle is doing the matching AI part, the modern LLM approach, previously called fuzzy logic.
  Both are AI.
  
  4 replies →
svat 1 month ago
It may help to look at this example concretely:
The natural-language statement of the problem is (from https://www.erdosproblems.com/728):
> Let C>0 and ϵ>0 be sufficiently small. Are there infinitely many integers a,b,n with a≥ϵn and b≥ϵn such that a!b!∣n!(a+b−n)! and a+b>n+Clogn?
The Lean-language statement of the problem (which can be done either by hand or by AI) is (from https://github.com/plby/lean-proofs/blob/f44d8c0e433ab285541...):
∀ᶠ ε : ℝ in [>] 0, ∀ C > (0 : ℝ), ∀ C' > C, ∃ a b n : ℕ, 0 < n ∧ ε * n < a ∧ ε * n < b ∧ a ! * b ! ∣ n ! * (a + b - n)! ∧ a + b > n + C * log n ∧ a + b < n + C' * log n
Yes on the one hand, one needs to know enough about Lean to be sure that this formulation matches what we intend, and isn't stating something trivial. But on the other hand, this is not as hard as finding an error on some obscure line of a long proof.
(There's also an older formulation at https://github.com/google-deepmind/formal-conjectures/blob/f... but the new one is more in the spirit of what was intended: see the discussion starting at https://www.erdosproblems.com/forum/thread/728#post-2196 which gives a clear picture, as of course does Tao's thread in the OP that summarizes this discussion.)
- kovek 1 month ago
  
  I'm wondering how do people come up with these mathematical challenges?
  
  1 reply →
maxwells-daemon 1 month ago
For this reason, when we announce results on e.g. the IMO, we formalize the statements by hand and inspect the proofs carefully to ensure they capture the full spirit of the problem.
However, there are some good heuristics. If you expect a problem to be hard and the proof is very short, you've probably missed something!
- tomjen3 1 month ago
  
  Aren't the best proofs very short?
  
  1 reply →
- bytesandbits 1 month ago
  
  no you don't
  https://www.reddit.com/r/singularity/comments/1pv3nl3/commen...
  
  1 reply →
maccam912 1 month ago

To answer the question a different way, I think you are asking how we know the proof actually matches the description the human provided? And I'd say we can't know for sure, but the idea is that you can pretty concisely write and check yourself that the problem is accurate, i.e. "There are an infinite number of primes" or whatever, and then even if an LLM goes off and makes up a lean proof wildly different from your description, if lean says the proof is valid then you have proven the original statement. I guess in theory the actual proof could be way different than what you thought it would be, but ultimately all the logic will still check out.
rtpg 1 month ago

I feel like even outside of AI translation, formalization not capturing the spirit of what the informal description was provided is always a risk.
This is also a big risk when trying to prove code correctness: "prove this algo works" means you gotta define "works" along certain axes, and if you're very unlucky you might have a proof that exploits the uncertainty around a certain axis.
jlouis 1 month ago

The statement is something you provide. It's the search you can have the LLM do. If this works for math it will immediately make code way higher quality via the same tools.
JacobAsmuth 1 month ago

You read it yourself :)
If you and the AI agree on the translation of the problem, and lean agrees with the solution, then you're done.
roenxi 1 month ago
You're looking for the practical answer, but philosophically it isn't possible to translate an informal statement into a formal one 'correctly'. It is informal, ie, vaguely specified. The only certain questions are if the formal axioms and results are interesting which is independent of the informal formalisation and that can only be established by inspecting the the proof independently of the informal spec.
- wizzwizz4 1 month ago
  
  Philosophically, this is not true in general, but that's for trivial reasons: "how many integers greater than 7 are blue?" doesn't correspond to a formal question. It is absolutely true in many specific cases. Most problems posed by a mathematician will correspond to exactly one formal proposition, within the context of a given formal system. This problem is unusual, in that it was originally misspecified.
  
  6 replies →

xiphias2 1 month ago

First congrats!

Sometimes when I'm using new LLMs I'm not sure if it’s a step forward or just benchmark hacking, but formalized math results always show that the progress is real and huge.

When do you think Harmonic will reach formalizing most (even hard) human written math?

I saw an interview with Christian Szegedy (your competitor I guess) that he believes it will be this year.

maxwells-daemon 1 month ago
Thank you! It depends on the topic. Some fields (algebra, number theory) are covered well by Lean's math library, and so I think we are already there; I recommend trying Aristotle for yourself to see how reliably it can formalize these theorems!
In other fields (topology, probability, linear algebra), many key definitions are not in Mathlib yet, so you will struggle to write down the theorem itself. (But in some cases, Aristotle can define the structure you are talking about on the fly!)
This is not an intrinsic limitations of Lean, it's just that nobody has taken the time to formalize much of those fields yet. We hope to dramatically accelerate this process by making it trivial to prove lemmas, which make up much of the work. For now, I still think humans should write the key definitions and statements of "central theorems" in a field, to ensure they are compatible with the rest of the library.
- zozbot234 1 month ago
  
  > ... But in some cases, Aristotle can define the structure you are talking about on the fly! ...
  Do you have any plans to characterize these cases more fully, and perhaps propose your own contributions to mathlib itself on that basis?
  
  1 reply →

pvillano 1 month ago

Is anyone working on applying these techniques to formal verification of software?

My limited understanding of Rust is that it applies a fixed set of rules to guarantee memory safety. The rules are somewhat simple and limiting, for ease of understanding and implementation, but also because of undecidability.

Programmers run into situations where they know that their code won't cause memory errors, but it doesn't follow the rules. Wouldn't it be cool if something like Aristotle was integrated into the compiler? Any code for which a proof of correctness could be written would pass/compile, without having to add more and more rules

maxwells-daemon 1 month ago
We are! We very recently announced some results on formally proving the correctness of programs: https://harmonic.fun/news#blog-post-verina-bench-sota
Formal methods are cool because, by contrast to tools like the borrow checker, you can prove some very "nonlocal" properties: this system does not deadlock, or it makes progress at least every N steps, etc.
- vatsachak 1 month ago
  
  Does Aristotle produce TLA+ output?
  For example can it read rust async code and prove that there are no deadlocks in TLA+, or some equivalent in Lean?
  
  1 reply →
- NetMageSCW 1 month ago
  
  How is “this system doesn’t deadlock” not the same as the halting problem?
  
  2 replies →
teiferer 1 month ago
An issue with this approach is that it may not be robust. That is, you could run into a casr where a minor modification of your program is suddenly not provable anymore, even though it is still correct. The heuristic (AI or otherwise) has necessarily limits, and if your are close to the "edge" of its capabilities then a minor change could push it across.
If the proof is rooted in the programmer's understanding who can give proof hints to the prover then any modification of the program can then be accompanied with a modification of the hints, still allowing automatic proofs. But if the human has no clue then the automatic system can get stuck without the human having a chance to help it along.
- amelius 1 month ago
  
  The same is true for optimization. One small change and the compiler's optimizer doesn't know anymore how to optimize the code, and your code is now slow. And there is no way for a programmer to fix it except by rolling back their changes or by inspecting the assembly output.
zozbot234 1 month ago
Formal verification of program correctness is also (for obvious reasons) key to unlocking AI-driven synthesis (i.e. 'vibe' coding) of "correct" programs that will verifiably meet the given spec.
- tomjen3 1 month ago
  
  It will certainly help - but its an extremely high bar. Almost all formal verification of software today is "does this pass the typechecker"?.
  Now this captures some errors, but it doesn't really capture high level ones (is this program guaranteed to not deadlock is a hard one), and it doesn't capture the one that is important for business purposes (does this do what the customer wants). That requirement is more important than correctness (vitness all the software that is described as "crap", but is nonetheless widely used).
  I don't think this is a required key to unlocking vibe coding. That seems to be easy: does this provide business value? And there the answer seems roughly to be "yes".
- eru 1 month ago
  
  Not all aspects of a spec can be formally encoded. But even half-way houses are good.
  Eg you can give the vague spec 'build me a todo list app', but you can still formally prove that everything your app does finishes, or even that it finishes in reasonable time.
  
  4 replies →

wslh 1 month ago

This is the forst time I heard about Aristotle and find it very interesting. First question first: is it available for the general public? I don't know if this is the page to try it? [1]

Second, when you say language modeling support, it means that can better understand code representation (ASTs) or something else? I am just an AI user, not very knowledgeable in the field. My main interest is if it would be great for static analysis oriented to computer security (SAST).

[1] https://aristotle.ai/

tachim 1 month ago

Yes it’s available! https://aristotle.harmonic.fun/

qnleigh 1 month ago

> If the proof is correct, Aristotle has a good chance at translating it into Lean

How does this depend on the area of mathematics of the proof? I was under the impression that it was still difficult to formalize most research areas, even for a human. How close is Aristotle to this frontier?

nateberkopec 1 month ago

>assuming you have formalized the statement correctly

That's a pretty big assumption, though, isn't it? As we saw the Navier-Stokes psychosis episode over the New Year holiday, formalizing correctly really isn't guaranteed.

bjt12345 1 month ago

What occurs when this process is reversed - translate from lean to informal english, and does iterating this then help research better approaches toward writing proofs in human language?

hannasm 1 month ago

I had the same thought but unfortunately even if that translation is accurate it could still be bidirectional hallucinating and would not really be sufficient evidence...
It's another reformulation rather than a true proof. Now, instead of wanting a proof of a theorem, now we just need to prove that this proof is actually proving the theorem. The proof itself being so incomprehensible that it can't on its own be trusted, but if it can be shown that it can be trusted then the theorem must be true.

uoaei 1 month ago

You seem to be openly contradicting your company's PR and language. Your description very clearly describes the "AI" as a tool to translate relatively informal specifications into formal proof logic, but does not itself do the proving.

jjmarr 1 month ago

What are the benefits of Aristotle over a general-purpose coding assistant like Claude Code?

maxwells-daemon 1 month ago
Aristotle's output is formally verified in Lean, so you can run it for days on a hard problem and be assured that the answer, no matter how complex, is right without needing to manually check it.
Claude Code can write lean, but we do a heck of a lot of RL on theorem proving, so Aristotle winds up being much better at writing Lean than other coding agents are.
- jjmarr 1 month ago
  
  Seeing a task-specific model be consistently better at anything is extremely surprising given rapid innovation in foundation models.
  Have you tried Aristotle on other, non-Lean tasks? Is it better at logical reasoning in general?
  
  3 replies →
- Davidzheng 1 month ago
  
  how strong is your internal informal LLM at theorem-proving before the formalization stage? or it's combined in a way so that is not measurable?

pfdietz 1 month ago

Do you have plans to apply this broadly to the historical math literature?

maxwells-daemon 1 month ago

Yes! I think that working with Mathlib is the best long term solution, because it's how people already collaborate on building out the formal "universe of mathematics." We want to speed that up, and hopefully we'll cover all of the common topics very soon!

EE84M3i 1 month ago

> there is no doubt that the proof is correct.

Do you have any links to reading about how often lean core has soundness bugs or mathlib has correctness bugs?

nottorp 1 month ago

So what did the "AI" actually do?

Translate an informal description of the proof into this Lean?

dpe82 1 month ago

TFA says ChatGPT wrote the informal description.

101008 1 month ago

Any chance Harmonic accept full remote employees? :)