Comment by maxwells-daemon
2 days ago
I work at Harmonic, the company behind Aristotle.
To clear up a few misconceptions:
- Aristotle uses modern AI techniques heavily, including language modeling.
- Aristotle can be guided by an informal (English) proof. If the proof is correct, Aristotle has a good chance at translating it into Lean (which is a strong vote of confidence that your English proof is solid). I believe that's what happened here.
- Once a proof is formalized into Lean (assuming you have formalized the statement correctly), there is no doubt that the proof is correct. This is the core of our approach: you can do a lot of (AI-driven) search, and once you find the answer you are certain it's correct no matter how complex the solution is.
Happy to answer any questions!
How do you verify that the AI translation to Lean is a correct formalization of the problem? In other fields, generative AI is very good at making up plausible sounding lies, so I'm wondering how likely that is for this usage.
That's what's covered by the "assuming you have formalized the statement correctly" parenthetical.
Given a formal statement of what you want, Lean can validate that the steps in a (tedious) machine-readable purported proof are valid and imply the result from accepted axioms. This is not AI, but a tiny, well reviewed kernel that only accepts correct formal logic arguments.
So, if you have a formal statement that you've verified to represent what you are interested in by some other means, Lean can tell you whether the proof created by genAI is correct. Basically, there is a nigh infallible checker that won't accept incorrect hallucinations.
I think the question is, how can humans have verification that the problem statement was correctly encoded into that Lean specification?
36 replies →
>That's what's covered by the "assuming you have formalized the statement correctly" parenthetical.
Sure. But it's fair to ask how to validate that assumption.
1 reply →
> This is not AI,
A little bit nitpicking, but according to books like AIMA that is indeed AI. In the first chapter even any control system is classified as AI.
Because of the reasons stated in the 1st chapter, I totally agree with the authors.
The whole system is AI. That part is a verifier in a chain of “suggestions/instict -> verifier” like used in neurosymbolic systems for automated driving, for example.
How hard is it to go back to English, from Lean? Just as hard as going from English to Lean?
If it is easier to convert backwards, maybe the AI can at least describe what the equations mean…
Soo, it can definitively tell you that 42 is correct Answer to the Ultimate Question of Life, The Universe, and Everything. It just can't tell you if you're asking the right question.
2 replies →
the argument here is that:
1. you write a proof in English that there is an infinite number of primes. 2. the llm writes 2+2=4 in lean. 3. lean confirms that this is correct and it's impossible that this proof is wrong.
2 replies →
Lean is doing logical AI, the classical AI part.
Aristotle is doing the matching AI part, the modern LLM approach, previously called fuzzy logic.
Both are AI.
4 replies →
It may help to look at this example concretely:
The natural-language statement of the problem is (from https://www.erdosproblems.com/728):
> Let C>0 and ϵ>0 be sufficiently small. Are there infinitely many integers a,b,n with a≥ϵn and b≥ϵn such that a!b!∣n!(a+b−n)! and a+b>n+Clogn?
The Lean-language statement of the problem (which can be done either by hand or by AI) is (from https://github.com/plby/lean-proofs/blob/f44d8c0e433ab285541...):
Yes on the one hand, one needs to know enough about Lean to be sure that this formulation matches what we intend, and isn't stating something trivial. But on the other hand, this is not as hard as finding an error on some obscure line of a long proof.
(There's also an older formulation at https://github.com/google-deepmind/formal-conjectures/blob/f... but the new one is more in the spirit of what was intended: see the discussion starting at https://www.erdosproblems.com/forum/thread/728#post-2196 which gives a clear picture, as of course does Tao's thread in the OP that summarizes this discussion.)
I'm wondering how do people come up with these mathematical challenges?
For this reason, when we announce results on e.g. the IMO, we formalize the statements by hand and inspect the proofs carefully to ensure they capture the full spirit of the problem.
However, there are some good heuristics. If you expect a problem to be hard and the proof is very short, you've probably missed something!
Aren't the best proofs very short?
1 reply →
no you don't
https://www.reddit.com/r/singularity/comments/1pv3nl3/commen...
1 reply →
To answer the question a different way, I think you are asking how we know the proof actually matches the description the human provided? And I'd say we can't know for sure, but the idea is that you can pretty concisely write and check yourself that the problem is accurate, i.e. "There are an infinite number of primes" or whatever, and then even if an LLM goes off and makes up a lean proof wildly different from your description, if lean says the proof is valid then you have proven the original statement. I guess in theory the actual proof could be way different than what you thought it would be, but ultimately all the logic will still check out.
I feel like even outside of AI translation, formalization not capturing the spirit of what the informal description was provided is always a risk.
This is also a big risk when trying to prove code correctness: "prove this algo works" means you gotta define "works" along certain axes, and if you're very unlucky you might have a proof that exploits the uncertainty around a certain axis.
The statement is something you provide. It's the search you can have the LLM do. If this works for math it will immediately make code way higher quality via the same tools.
You're looking for the practical answer, but philosophically it isn't possible to translate an informal statement into a formal one 'correctly'. It is informal, ie, vaguely specified. The only certain questions are if the formal axioms and results are interesting which is independent of the informal formalisation and that can only be established by inspecting the the proof independently of the informal spec.
Philosophically, this is not true in general, but that's for trivial reasons: "how many integers greater than 7 are blue?" doesn't correspond to a formal question. It is absolutely true in many specific cases. Most problems posed by a mathematician will correspond to exactly one formal proposition, within the context of a given formal system. This problem is unusual, in that it was originally misspecified.
6 replies →
First congrats!
Sometimes when I'm using new LLMs I'm not sure if it’s a step forward or just benchmark hacking, but formalized math results always show that the progress is real and huge.
When do you think Harmonic will reach formalizing most (even hard) human written math?
I saw an interview with Christian Szegedy (your competitor I guess) that he believes it will be this year.
Thank you! It depends on the topic. Some fields (algebra, number theory) are covered well by Lean's math library, and so I think we are already there; I recommend trying Aristotle for yourself to see how reliably it can formalize these theorems!
In other fields (topology, probability, linear algebra), many key definitions are not in Mathlib yet, so you will struggle to write down the theorem itself. (But in some cases, Aristotle can define the structure you are talking about on the fly!)
This is not an intrinsic limitations of Lean, it's just that nobody has taken the time to formalize much of those fields yet. We hope to dramatically accelerate this process by making it trivial to prove lemmas, which make up much of the work. For now, I still think humans should write the key definitions and statements of "central theorems" in a field, to ensure they are compatible with the rest of the library.
> ... But in some cases, Aristotle can define the structure you are talking about on the fly! ...
Do you have any plans to characterize these cases more fully, and perhaps propose your own contributions to mathlib itself on that basis?
1 reply →
Is anyone working on applying these techniques to formal verification of software?
My limited understanding of Rust is that it applies a fixed set of rules to guarantee memory safety. The rules are somewhat simple and limiting, for ease of understanding and implementation, but also because of undecidability.
Programmers run into situations where they know that their code won't cause memory errors, but it doesn't follow the rules. Wouldn't it be cool if something like Aristotle was integrated into the compiler? Any code for which a proof of correctness could be written would pass/compile, without having to add more and more rules
We are! We very recently announced some results on formally proving the correctness of programs: https://harmonic.fun/news#blog-post-verina-bench-sota
Formal methods are cool because, by contrast to tools like the borrow checker, you can prove some very "nonlocal" properties: this system does not deadlock, or it makes progress at least every N steps, etc.
Does Aristotle produce TLA+ output?
For example can it read rust async code and prove that there are no deadlocks in TLA+, or some equivalent in Lean?
1 reply →
How is “this system doesn’t deadlock” not the same as the halting problem?
An issue with this approach is that it may not be robust. That is, you could run into a casr where a minor modification of your program is suddenly not provable anymore, even though it is still correct. The heuristic (AI or otherwise) has necessarily limits, and if your are close to the "edge" of its capabilities then a minor change could push it across.
If the proof is rooted in the programmer's understanding who can give proof hints to the prover then any modification of the program can then be accompanied with a modification of the hints, still allowing automatic proofs. But if the human has no clue then the automatic system can get stuck without the human having a chance to help it along.
The same is true for optimization. One small change and the compiler's optimizer doesn't know anymore how to optimize the code, and your code is now slow. And there is no way for a programmer to fix it except by rolling back their changes or by inspecting the assembly output.
Formal verification of program correctness is also (for obvious reasons) key to unlocking AI-driven synthesis (i.e. 'vibe' coding) of "correct" programs that will verifiably meet the given spec.
Not all aspects of a spec can be formally encoded. But even half-way houses are good.
Eg you can give the vague spec 'build me a todo list app', but you can still formally prove that everything your app does finishes, or even that it finishes in reasonable time.
2 replies →
It will certainly help - but its an extremely high bar. Almost all formal verification of software today is "does this pass the typechecker"?.
Now this captures some errors, but it doesn't really capture high level ones (is this program guaranteed to not deadlock is a hard one), and it doesn't capture the one that is important for business purposes (does this do what the customer wants). That requirement is more important than correctness (vitness all the software that is described as "crap", but is nonetheless widely used).
I don't think this is a required key to unlocking vibe coding. That seems to be easy: does this provide business value? And there the answer seems roughly to be "yes".
> If the proof is correct, Aristotle has a good chance at translating it into Lean
How does this depend on the area of mathematics of the proof? I was under the impression that it was still difficult to formalize most research areas, even for a human. How close is Aristotle to this frontier?
>assuming you have formalized the statement correctly
That's a pretty big assumption, though, isn't it? As we saw the Navier-Stokes psychosis episode over the New Year holiday, formalizing correctly really isn't guaranteed.
What occurs when this process is reversed - translate from lean to informal english, and does iterating this then help research better approaches toward writing proofs in human language?
I had the same thought but unfortunately even if that translation is accurate it could still be bidirectional hallucinating and would not really be sufficient evidence...
It's another reformulation rather than a true proof. Now, instead of wanting a proof of a theorem, now we just need to prove that this proof is actually proving the theorem. The proof itself being so incomprehensible that it can't on its own be trusted, but if it can be shown that it can be trusted then the theorem must be true.
What are the benefits of Aristotle over a general-purpose coding assistant like Claude Code?
Aristotle's output is formally verified in Lean, so you can run it for days on a hard problem and be assured that the answer, no matter how complex, is right without needing to manually check it.
Claude Code can write lean, but we do a heck of a lot of RL on theorem proving, so Aristotle winds up being much better at writing Lean than other coding agents are.
Seeing a task-specific model be consistently better at anything is extremely surprising given rapid innovation in foundation models.
Have you tried Aristotle on other, non-Lean tasks? Is it better at logical reasoning in general?
2 replies →
how strong is your internal informal LLM at theorem-proving before the formalization stage? or it's combined in a way so that is not measurable?
Do you have plans to apply this broadly to the historical math literature?
Yes! I think that working with Mathlib is the best long term solution, because it's how people already collaborate on building out the formal "universe of mathematics." We want to speed that up, and hopefully we'll cover all of the common topics very soon!
> there is no doubt that the proof is correct.
Do you have any links to reading about how often lean core has soundness bugs or mathlib has correctness bugs?
So what did the "AI" actually do?
Translate an informal description of the proof into this Lean?
TFA says ChatGPT wrote the informal description.
Any chance Harmonic accept full remote employees? :)
This is the forst time I heard about Aristotle and find it very interesting. First question first: is it available for the general public? I don't know if this is the page to try it? [1]
Second, when you say language modeling support, it means that can better understand code representation (ASTs) or something else? I am just an AI user, not very knowledgeable in the field. My main interest is if it would be great for static analysis oriented to computer security (SAST).
[1] https://aristotle.ai/
Yes it’s available! https://aristotle.harmonic.fun/
You seem to be openly contradicting your company's PR and language. Your description very clearly describes the "AI" as a tool to translate relatively informal specifications into formal proof logic, but does not itself do the proving.