Comment by kevinventullo

15 hours ago

This year, our advanced Gemini model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions

I think I have a minority opinion here, but I’m a bit disappointed they seem to be moving away from formal techniques. I think if you ever want to truly “automate” math or do it at machine scale, e.g. creating proofs that would amount to thousands of pages of writing, there is simply no way forward but to formalize. Otherwise, one cannot get past the bottleneck of needing a human reviewer to understand and validate the proof.

If a Language Model is capable of producing rigorous natural language proofs then getting it to produce Lean (or whatever) proofs would not be a big deal.

Lean use in AlphaProof was something of a crutch (not saying this as a bad thing). Very specialized, very narrow with little use outside any other domain.

On the other hand, if you can achieve the same with general RL techniques and natural language then other hard-to-verify (a whole lot) domains are on the table.

  • If a Language Model is capable of producing rigorous natural language proofs then getting it to produce Lean (or whatever) proofs would not be a big deal.

    This is a wildly uninformed take. Even today there are plenty of basic statements which LLM’s can produce English language proofs of that have not been formalized.

    • Most mathematicians aren't much interested in translating statements for the fun of it, so whether a lot of basic statements are un-formalized doesn't mean much. And the point was never that formalization was easy.

      I said that if language models become capable enough of not needing a crutch, then adding one afterwards isn't a big deal. What exactly do you think Alphaproof is? Much worse LLMs were already doing what you are saying. There's a reason it preceded this and not the other way around.

      2 replies →

I'm a mathematician, although not doing research anymore. I can maybe offer a little bit of perspective on why we tend to be a little cooler on the formal techniques, which I think I've said on HN before.

I'm actually prepared to agree wholeheartedly with what you say here: I don't think there'd be any realistic way to produce thousand-page proofs without formalization, and certainly I wouldn't trust such a proof without some way to verify it formally. But I also don't think we really want them all that much!

The ultimate reason I think is that what really lights a fire under most mathematicians is the desire to know why a result is true; the explanation is really the product, much more so than just the yes-or-no answer. For example, I was never a number theorist, but I think most people who are informed enough to have an opinion think that the Riemann Hypothesis is probably true, and I know that they're not actually waiting around to find out. There are lots of papers that get published whose results take the form "If the Riemann Hypothesis is true then [my new theorem]."

The reason they'd still be excited by a proof is the hope, informed by experience with proofs of earlier long-standing open problems, that the proof would involve some exciting new method or perspective that would give us a deeper understanding of number theory. A proof in a formal language that Lean says is true but which no human being has any hope of getting anything from doesn't accomplish that.

  • A proof written in a formal language can absolutely be illuminating to a human, but you have to pick the correct formal language and ecosystem.

    Writing proofs in Agda is like writing programs in a more expressive variant of Haskell. Abelson said that “programs must be written for people to read, and only incidentally for machines to execute”, and by the Curry-Howard isomorphism, proofs can be seen as programs. All the lessons of software engineering can and indeed should be applied to making proofs easier for humans to read.

    For a quick example, check out my mechanization of Martin-Löf’s 2006 paper on the axiom of choice:

    https://news.ycombinator.com/item?id=44269002

    • I certainly didn't mean to dispute that! Formal proofs have a lot in common with code, and of course reading code is illuminating to humans all the time.

      I meant to be responding specifically to the case where some future theorem-proving LLM spits out a thousand-page argument which is totally impenetrable but which the proof-checker still agrees is valid. I think it's sometimes surprising to people coming at this from the CS side to hear that most mathematicians wouldn't be too enthusiastic to receive such a proof, and I was just trying to put some color on that reaction.

      1 reply →

  • Thanks for the reply. I am also a no-longer-practicing mathematician :)

    I completely agree that a machine-generated formal proof is not the same thing as an illuminating human-generated plain-language proof (and in fact I suspect without further guidance they will be quite different, see my other stream of thought comment). However, I do think machine-generated formal proofs would be interesting for a few reasons:

    1. Sometimes the obvious thing is not true!

    2. I think the existence or non-existence of a machine-generated proof of a mathematical claim is interesting in its own right. E.g. what kinds of claims are easy versus hard for machines to prove?

    3. In principle, I would hope they could at least give a starting point for a proper illuminating proof. E.g. the process of refinement and clarification, which is present today even for human proofs, could become more important, and could itself be machine-assisted.

    • Oh, I hope I didn't come off as talking down to you! As I said in another reply here, the intention behind this comment was pretty narrow --- there's a certain perspective on this stuff that I see pretty often on HN that I think is missing some insight into what makes mathematicians tick, and I may have been letting my reaction to those other people leak into my response to you. Sorry for math-splaining :).

      Anyway, yeah, if this scenario does come to pass it will be interesting to see just how impenetrable the resulting formal proofs end up looking and how hard it is to turn them into something that humans can fit in their heads. I can imagine a continuum of possibilities here, with thousands of pages of inscrutable symbol-pushing on one end to beautiful explanations on the other.

  • > The ultimate reason I think is that what really lights a fire under most mathematicians is the desire to know why a result is true; the explanation is really the product, much more so than just the yes-or-no answer

    Of course, but a formal system like Lean doesn't merely spit out a yes-or-now answer, it gives you a fully-fledged proof. Admittedly, it may be harder to read than natural language, but that only means we could benefit from having another tool that translates Lean proofs into natural language.

  • Ideally we'd be able to get a little of both. A proof of such magnitude should likely come with new definitions that are easy to search for in the code and start to reason about independently. Even without looking at the rest of the proof, I imagine we'd be able to deduce a fair amount of the structure just by understanding the new things that are being defined, what existing theorems are being imported, and connecting the dots.

    Your comment reminds me of Tao's comment on the ABC conjecture: usually with a big proof, you progressively get new tools and examples of how they can be applied to other problems. But if it's hundreds of pages of formulas that just spits out an answer at the end, that's not how math usually works. https://galoisrepresentations.org/2017/12/17/the-abc-conject...

    If these provers do end up spitting out 1000-page proofs that are all calculation with no net-new concepts, I agree they'll be met with a shrug.

  • I have always wondered about what could be recovered if the antecedent (i.e. in this case the Riemann hypothesis) does actually turn out to be false. Are the theorems completely useless? Can we still infer some knowledge or use some techniques? Same applies to SETH and fine-grained complexity theory.

    • I don't know enough about the RH examples to say what the answer is in that case. I'd be very interested in a perspective from someone who knows more than me!

      In general, though, the answer to this question would depend on the specifics of the argument in question. Sometimes you might be able to salvage something; maybe there's some other setting where same methods work, or where some hypothesis analogous to the false one ends up holding, or something like that. But of course from a purely logical perspective, if I prove that P implies Q and P turns out to be false, I've learned nothing about Q.

(Stream of consciousness aside:

That said, letting machines go wild in the depths of the consequences of some axiomatic system like ZFC may reveal a method of proof mathematicians would find to be monstrous. So like, if ZFC is inconsistent, then anything can be proven. But short of that, maybe the machines will find extremely powerful techniques which “almost” prove inconsistency that nevertheless somehow lead to logical proofs of the desired claim. I’m thinking by analogy here about how speedrunning seems to often devolve into exploiting an ACE glitch as early as possible, thus meeting the logical requirements of finishing the game while violating the spirit. Maybe we’d have to figure out what “glitchless ZFC” should mean. Maybe this is what logicians have already been doing heh).

  • The horror scenario you describe would actually be more valuable than the insinuated "spirit" I believe:

    Suppose it faithfully reasons and attempts to find proofs of claims, in the best case you found a proof of a specific claim (IN AN INCONSISTENT SYSTEM).

    Suppose in the "horror scenario" that the machine has surreptitiously found a proof of false in ZFC (and can now prove any claim), and is not disclosing it, but abusing it to present 'actual proofs in inconsistent ZFC' for whatever claims the user asks it. In this case we can just ask for a proof of A and a proof of !A, if it proves both it has leaked the fact it found and exploits an inconsistency in the formal system! Thats worth more than a hard to find proof, in an otherwise inconsistent system.

These problems are designed to be solvable by humans without tools. No reason we can't give tools to the models when they go after harder problems. I think it's good to at least reproduce human-level skill without tools first.

  • Oh so to be clear, I view formal methods as less of a useful tool, and more as enforcing a higher standard of proof. E.g. it’s not clear to me that having access to Lean would actually help a human in the IMO; certainly most professional mathematicians are not yet getting a positive ROI from formalization. But I’m sort of an armchair expert here; I could be wrong!

Accurate formalization is presumably easier than solving the problems, so you can always formalize and check after the solution is generated

  • Typically formalization is actually harder than solving a problem. You almost always solve before formalizing. And it can be surprisingly hard to formalize problems that are easy to solve.

    For example, is there a polygon of area 100 that you can fit 99 circles of area 1 inside it, without overlapping? Yes, obviously, it's very easy to prove this informally. Now try formalizing it! You will find it takes a while to formalize a number of fairly obvious geometric statements.