I’ve been kicking around the idea of something like Lean (maybe even just Lean?) to rewrite news and other non-fiction articles, treating statements as theorems that need to be proven. Proofs could include citations, and could be compound things like “this is a fact if three of my approved sources asserted it as a fact”
It should then be possible to get a marked-up version of any document with highlighting for “proven” claims.
Sure, it’s not perfect, but I think it would be an interesting way to apply the rigor that used to be the job of publications.
Formalising natural language statements is a minefield of difficulties, for (imo) essentially the same reasons that writing code which interacts with the real world is so difficult. Concepts you take for granted like identity, time, causality... all of that stuff needs to be fleshed out carefully and precisely in the formalism for facts to be relatable to each other (or even expressible in the first place).
Not to discourage you - it's a cool problem! OpenCog comes to mind as a project that tried to take this all the way, and there's a field devoted to this stuff in academia called KRR (knowledge representation and reasoning). The IJCAI journal is full of research on similar topics.
(also see all the different logics philosophers use to formalise different kinds of argument involving time/modality/probability - there are a lot, but they aren't very "modular" and can't easily be mixed unless someone has solved that since I last looked into it)
> Concepts you take for granted like identity, time, causality... all of that stuff needs to be fleshed out carefully and precisely in the formalism for facts to be relatable to each other (or even expressible in the first place).
Yes, and different formalisations of identity apply in different contexts.
Eg remember the famous line about not being able to step in the same river twice.
> logics philosophers use .. aren't very "modular" and can't easily be mixed
Not sure if the model-checking communities would agree with you there. For example CTL-star [0] mixes tree-logic and linear-temporal, then PCTL adds probability on top. Knowledge, belief, and strategy-logics are also mixed pretty freely in at least some model checkers. Using mixed combinations of different-flavored logic does seem to be going OK in practice, but I guess this works best when those diverse logics can all be reduced towards the same primitive data structures that you want to actually crunch (like binary decision diagrams, or whatever).
If no primitive/fast/generic structure can really be shared between logics, then you may be stuck with some irreconcilable continuous-vs-discrete or deterministic-vs-probabilistic disconnect, and then require multiple model-checkers for different pieces of one problem. So even if mixing different flavors of logics is already routine.. there's lots of improvements to hope for if practically everything can be directly represented in one place like lean. Just like mathematicians don't worry much about switching back and forth from geometry/algebra, less friction between representations would be great.
Speaking of CTL, shout out to Emerson[1], who won a Turing award. If he hadn't died recently, I think he'd be surprised to hear anyone suggest he was a philosopher instead of a computer scientist ;)
> (also see all the different logics philosophers use to formalise different kinds of argument involving time/modality/probability - there are a lot, but they aren't very "modular" and can't easily be mixed unless someone has solved that since I last looked into it)
From a logic-as-types perspective, modalities turn out to be monads. So the problem of "mixing" modalities is quite similar to the problem of composing monads in programming languages.
Correct. The only way to not subject oneself to (economic) irrationality is to model your beliefs about the world using probabilities and using Bayes to update in light of new evidence.
Careful, such an approach could easily end up giving an aura of logical objectivity to arbitrarily radical and nonsensical ideas. The political views of one of the fathers of modern logic may serve as a cautionary tale [0].
> I’ve been kicking around the idea of something like Lean (maybe even just Lean?) to rewrite news and other non-fiction articles, treating statements as theorems that need to be proven.
I found I got much better at writing non-fiction after having math at uni. I would help proof-read essays and other hand-ins by my SO and sister, and apply similar rigor as you mention. Stuff like "you show C follows from B here, but you haven't actually given an argument for why B follows A, so you can't then claim C follows from A".
It's tempting to say that with LLMs this seems like a plausible task to turn it into a program, but the hallucination issue puts a damper on that scheme.
There are rhetorical tricks which rely on this to be persuasive. You can say "Thing X is happening, so we should do Thing Y", and people will nod.
If you're sneaky about it it will read like a logical conclusion when in fact X and Y are only loosely related contextually, and there is no logical chain at all.
A standard political trick is to blame X on something emotive and irrelevant, and offer Y as a false solution which distracts from the real causes of the problem.
This is used so often it's become a core driver of policy across multiple domains.
Although it's very effective, it's a crude way to use this. There are more subtle ways - like using X to insinuate criticism of a target when Y is already self-evident.
Point being, a lot of persuasive non-fiction, especially in politics, law, religion, and marketing, uses tricks like these. And many others.
They work because they work in the domain of narrative logic - persuading through stories and parables with embedded emotional triggers and credible-sounding but fake explanations, where the bar of "That sounds plausible" is very low.
LLMs already know some of this. You can ask ChatGPT to make any text more persuasive, and it will give you some ideas. You can also ask it to read a text, pull out the rhetorical tricks, and find the logical flaws.
It won't do as good a job as someone who uses rhetoric for a living. But it will do a far better job than the average reader, who is completely unaware of rhetoric.
I think for that kind of thing, Prolog would be sufficient. Lean is more powerful, but it requires you to supply proofs - Prolog can just automatically run inferences.
(Although I wouldn't be surprised if somebody had already recreated something like Prolog in Lean with a tactic that does more or less what the Prolog interpreter does)
I think it'd be more interesting to map out entire trees of arguments about a topic. Start with something big, like "is there a God," then come up with all of the arguments for or against, then come up with all of the arguments against those arguments, then the counters to those arguments. Really explore it as a sort of debate space. Then bring in citations not as backing but for historical context: "Plato made this argument in such and such." The idea wouldn't be so much to decide a winner as to prevent us from going around in circles by making a map.
As a programmer who has studied philosophy: This is the approach I take more or less in my head when reading articles, however the problem is the ambiguity of natural language.
E.g. lets say the statement is about a presidents promise to end a conflict within 24 hours after coming into office. One could get to a conclusion pretty quickly when the conflict hasn't been ended after 24 hours when they entered the office.
But what does "end the conflict" mean exactly? If the conflict ended how long does it need to remain ended to achieve the label "ended"? What if said president has a history that recontextualizes the meaning of that seemingly simple claim because he is known to define the word "ended" a little different than the rest of us, do you now judge by his or by our definition? What if the conflict is ended but there is a small nest of conflict remaining, after which size do we consider the conflict going on?
I know some of that has official definitions, but not everything has. In the end a lot of that will require interpretation and which definition to chose. But yeah I support your idea, just spelling it out and having a machine-readable chain of thought might help already.
Bah, the answer is easy: words should be used with their most common definition, everything else is a lie.
If you want to use a word 'differently' then you need to do so explicitly.
Proof != evidence. In evidence, we corroborate, collate, add more sources, weigh evidence, judge. Proof is a totally different process. Only in the mathematical do one prove something, everywhere else we build up evidence, corroborate, etc.
Check out https://argdown.org/ for the much simpler version of that idea. No proofs, just supporting/conflicting evidence and arguments. But that's hard enough to consistently produce as it is.
News and non-fiction articles are not math and cannot be treated as math. At best you could build a tool that check the most glaring contradictions (like a typo changing a number), and I'm not even sure it can be consistent without building a software that understand language. At worse you would build a tool that spit bullshit that millions of people would treat as gospel
For the form, you might be interested in Ethica, by Spinoza [1]. On the other hand, for fact checking, the key concept seems to be trust in sources rather than logical consistency.
I think you could just use lean, and progressively elaborate the inference graph.
But you might find a historical/journalistic/investigative angle more profitable.
I heard a software product pitch from a small law practice who were outside investigators for abuse/policy-violation claims in a unionized workforce. They had a solid procedure: start with the main event. Interview all witnesses, and identify other interesting events. Then cross-reference all witnesses against all alleged/interesting events. That gives you a full matrix to identify who is a reliable witness, what sort of motivations are in play, and what stories are being told, patterns of behaviour, etc. Their final report might actually focus on another event entirely that had better evidence around it.
In that sort of historical investigation, a single obviously false claim undermines all claims from that witness. Pure logic doesn't have that bi-directionality of inference. Sometimes your sources just flat out lie! That's half the work.
The key difference is between a full matrix of claims vs sources/evidence, while good math is all about carefully plotting a sparse inference graph that traverses the space. What is called "penetrating minimalism" in math is called "cherry picking" in history or journalism.
We passed on the pitch. Didn't see how it scaled to a product worth building, but their process was appealing. Much better than the an internal popularity-contest or witch-hunt. Probably better than the average police investigation too.
I think a more practical method would be to trace the provenance of a claim, in a way that lets you evaluate how likely it is for the person making it to actually know the facts.
A simple scenario would be reading a news article about american politics. You see an unusual claim made, so you start tracing it and find that the journalist got it from person X who got it from person Y who got it from donald trump. Trump is famous for lying constantly, so unless there's a second source making the claim, you could disregard it with a high probability.
Wait… so are we basically compiling a dictionary of proofs all stemming from a handful of self-evident truths? And every further proof is just some logical aggregation of previous proofs?
Can someone please turn this into a Zachtronics style game?! I badly, direly want this. There’s a game called Euclidea that’s kind of like this for trigonometry and the whole premise of building a tower of logic is deeply attractive to me.
Is this what pure math is about? Is this what people who become math profs are feeling in their soul? The sheer thrill of adding to that dictionary of proofs?
Aside: I recall some famous mathematician had made a list of base proofs that you just hold to be true. Can someone remind me who, and/or what that list is called? I’m guessing they’re considered axioms.
There is a game already, though it may not be exactly what you want (and the goal is definitely not "generate all known math"). I've played around with it, and I do think it's pretty fun.
I played through a lot of it, and while it was fun, I wouldn't call it gamified in the sense that Zachtronics does it. As an FYI, here are some sticking points I ran into as well:
1) It uses the term "goal" for the the current stage of the proof you're working on, which is counterintuitive. In common speech, the goal is the theorem you're trying to prove, not the current step you're on in proving it. It would be like if I were helping you get to work in the morning, and I said, "Okay, you're current [sub]goal is that you're done brushing your teeth." No. That's not a goal. That's progress toward a goal.
2) It doesn't warn you early on about the left-associativeness of the operator binding, or what to do about it. So you can apply what seem like valid transformations but then parentheses pop up that you don't expect, and which you have no guidance on how to deal with. I had to ask ChatGPT what was going on to solve it.
> Can someone please turn this into a Zachtronics style game?!
That game is called math :) Partially joking, but I do think a game version would be fun.
> Is this what pure math is about?
More or less yes for an undergrad, but when you get to research it feels different.
> I badly, direly want this
Consider checking out an abstract algebra book. Maybe Dummit and Foote or something else popular. Proofs in algebra often have a satisfying game-like quality. The more popular books will have solutions you can find online if you get stuck.
You might be thinking of Euclid's axioms, which defines points, lines, planes, etc, and that lines can be parallel. This is an interesting one because the system is violated if your space is not flat, like if you're on a sphere.
You could also be thinking of Zermelo-Fraenknel set theory (ZF/ZFC), which most of modern mathematics is ultimately based upon.
This highlights I think what is an ultimate pitfall of something like Lean. It is something like this: when an assumption is broken or violated in mathematics, the whole field of mathematics can _grow_. Non-euclidean geometry, imaginary numbers, etc are all examples of this. Trying to cram maths into a programming language sounds like it would constrain it, removing the creativity and reducing all maths to a search space.
Check out the game Bombe [1]. It's a minesweeper variant where instead of directly flagging or uncovering cells, you define rules for when cells can be flagged. As it gets more advanced you end up building lemmas that implicitly chain off each other. Then as _you_ get more advanced (and the game removes some arbitrary restrictions around your toolset) you can generalize your rules and golf down what you've already constructed.
In my mind this is literally what math is. We start with axioms, and derive conclusions. There's probably more to it than that, but that's the understanding I'm at now.
> so are we basically compiling a dictionary of proofs all stemming from a handful of self-evident truths
I would say, "from a handful of axioms".
It's certainly true that when Euclid started this whole enterprise, it was thought thax axioms should be self-evident. But then, many centuries later, people discovered that there are other interesting geometries that don't satisfy the same axioms.
And when you get to reasoning about infinities, it's very unclear that anything about them can be considered self-evident (a few mathematicians even refuse to work with infinities, although it's definitely a very niche subcommunity).
Some of today's common axioms are indeed self-evident (such as "you should be able to substitute equal subterms"), but things like the axiom of choice have (at least historically) been much more controversial. I would probably say that such axioms can be considered "plausible" and that they generally allow us to be able to prove what we want to prove. But you'll definitely find mathematicians championing different axioms.
> Aside: I recall some famous mathematician had made a list of base proofs that you just hold to be true. Can someone remind me who, and/or what that list is called? I’m guessing they’re considered axioms.
That would be the ZFC axioms. It was originally the ZF axioms (named so after the mathematicians Zermelo and Fraenkel who worked in the early 20th century), and then later the Axiom of Choice (C) was added. It's generally considered to be the "standard" set of axioms for maths, although very few mathematicians actually work directly from these axioms. But in theory, you can take almost every mathematical proof (unless it's explicitly set in some other foundation) and recast it entirely in applications of the ZFC axioms.
Thanks for this. So I have half a thought I’m trying to flesh out from what you’ve shared. Bear with me, whoever reads this:
Are there essentially two flavours:
- the maths based on axioms that are fundamentally, cosmically true such as x = x. And in doing so we’re formalizing on paper the universal truths and all subsequent rules we know to be true given these
- the maths that incorporate those plus additional axioms that aren’t necessarily fundamentally true (or maybe just not provable), but work very well at laying a foundation for further rules that build a practically useful, but not necessarily “correct” toolbox
With the latter, is it kind of a “they won’t hold up under normal conditions but if you accept these axioms, there’s interesting things you can explore and achieve?”
> Aside: I recall some famous mathematician had made a list of base proofs that you just hold to be true. Can someone remind me who, and/or what that list is called? I’m guessing they’re considered axioms.
Probably similar to how numerous maths PhDs I’ve worked with are terrible at shipping durable production code on time and on budget, or how many doctors are terrible at inserting IVs, or how an architect may likely be terrible at framing a house, or how some Grammy winning musicians can't read classical notation: your mental model for domains of expertise is probably wrong.
One problem I have with learning Lean is that tactics - like rfl in the example - are overloaded, and their full semantics not completely explained/understandable from the tutorials. Unlike, say, C programming where one may understand what happens to the program state down to the bit, it feels too fuzzy. And the rewrite (rw) tactic syntax doesn't feel natural either.
Yeah, I've similarly found the tactics in Coq (now Rocq) difficult to internalize. E.g., I might have "A = B" and "P(A,A)" available, and I want to conclude "P(A,B)", but the rewrite will fail for some arcane reason. (Issues with the definition of some of the intermediate structures, I'd imagine.)
On the other end of the spectrum, I've recently been playing with Metamath and its set.mm database, which has no programmable tactics at all, only concrete inferences that can be used in a proof. (E.g., the modus ponens inference ax-mp says that "|- ph" and "|- ( ph -> ps )" prove "|- ps", where "ph" and "ps" are variables that can be substituted.) Alas, it's not much better, since now you have to memorize all the utility lemmas you might need!
Agreed - the Metamath base language (and its verifier) seem to be the most tractable of all I've seen, although it is probably still quite far away from the complexity of the high level language(s) that compile to it.
Derived from it, the currently best attempt to achieve an unambiguous and secure language seems to be Metamath Zero: https://github.com/digama0/mm0
This is one of the reasons I prefer Agda. It is usually written without tactics, you just write the proof term in a functional programming language via the Curry–Howard correspondence. The trade off is that you must be more disciplined with creating useful abstractions and functions, otherwise proving even routine stuff becomes really tedious.
I'm not sure you prefer Agda so much as you prefer providing proof terms functionally rather than imperatively. Then again I've never used Agda; how does it differ from Coq/Rocq minus Ltac or Lean minus tactics?
At least you can 'go to definition' on the tactics and see what they're doing. It's a lot to take in at the beginning, but it can all be inspected and understood. (At least until you get to the fundamental type theory; the reduction rules are a lot harder to get into.)
> the rewrite (rw) tactic syntax doesn't feel natural either.
Do you have any thoughts on what a natural rewrite syntax would be?
> Do you have any thoughts on what a natural rewrite syntax would be?
Not yet, but I'd probably prefer something that more explicitly indicated (in English, or some sort of more visually "pointing" indicator) which specific parts of the previous step would be replaced
It feels weird, or I'd have to get used to that both what is being replaced and what it is replaced with depends on some distant context, it's very indirect as it requires switching attention between the tactics tree and the context or previous proofs
You can absolutely use pattern matching in Lean instead of tactics, if you prefer to write the proof that is closer to "what is going on under the hood"
It was interesting to me as it didn't fit my expectations. As a math theory ignoramus, I expected that reflection and rewrite are more fundamental than addition. But Lean seems to assume addition but require explicit rfl and rewrite.
Perhaps there's a Lean "prelude" that does it for you.
Yes, there is a prelude that defines natural numbers and an addition function on them. As the post notes, the reflexivity tactic "unfolds" the addition, meaning that it applies the addition function to the constants it is given, to arrive at a constant. This is not specific to addition, it unfolds other function definitions too. So addition doesn't have to come first, you are right that reflexivity is more fundamental.
It's because the addition can evaluate to only one form: `3 + 3` and `6` both evaluate to `succ (succ (succ (succ (succ (succ zero)))))`, similarly Lean can infer `4 * 3 = 1 + 3 + 8` because both evaluate to the same as `12`. But an expression can be rewritten to an infinite number of forms, e.g. `x * 2` can be rewritten to `x + x`, `(x + (x / 2)) * 4/3)`, etc. So `refl` can automatically evaluate both sides but not rewrite them.
The surprising thing to me was that tactics are all written in "user-level" code, outside the proof kernel. This makes sense in the sense that you want a small, thoroughly tested kernel that doesn't change. But it also implies that if you use tactics in your proof, then your proof can go from correct to failing if one of the tactics you use gets modified between releases. Is that a problem in real world use?
Yes. I've seen proofs fail after a mathlib update because the behaviour of simp changed. I've never seen it do less after an update (so far), but sometimes it'll simplify more and then the next steps may fail to work.
I've since adopted the suggestion (that is afaik used in mathlib) to never use bare simp unless it closes the current goal directly. Instead, if you write simp?, Lean will run the simplifier and tell you exactly which theorems it used (in the form of simp only [...]) which you can then insert into the proof instead.
Yeah the documentation is also quite fragmented because tactics are user-definable and some are from Lean vs from Mathlib etc. I've gotten quite decent at using the basic ones but I still sometimes ask on Zulip if something isn't working as I expected.
I understand what you mean, but I shudder to think at how complicated it would be to write certain proofs without tactics, e.g. simp, ring, omega, linarith and friends all really do a lot of work.
Nitpick, but it's a bit strange to say that the two_eq_two theorem looks like a function. It looks more like a constant, since it has no arguments. (Yes I know that constants are nullary functions.)
I would find the following a more convincing presentation:
theorem x_eq_x (x:nat) : x = x := by
rfl
theorem 2_eq_2 : 2 = 2 := by
exact (x_eq_x 2)
Here x_eq_x looks like a function, and in 2_eq_2's proof we apply it like a function.
Fair! I decided not to do that because the way arguments work (and dependent types in general — like being able to return a proof of `x = x` given x) is unusual for people like me, and deserves an article of its own. So I’m kicking that to one of the next articles.
One thing I didn't know until recently was that Lean doesn't solve the problem of verifying that a formal theorem actually states what we think it states rather than something else.
In some cases that's not an issue, because the formal statement of the theorem is very simple. E.g. for Fermat's Last Theorem, which can be formally expressed with "(x y z : ℕ+) (n : ℕ) (hn : n > 2) : x^n + y^n ≠ z^n". But in other cases it might be much harder to verify that a formal statement actually matches the intuitive informal statement we want.
Of course, Lean or similar languages still offer the huge potential of verifying that a formal statement is provable, even if they don't help with verifying that the formal statement matches the intended informal statement in natural language.
Are there, in general, ways to do that? Sorry if this sounds like a troll question, but I struggle to think of how one would verify that a formal statement matches the accepted understanding other than painstakingly going through the definitions with experts from both sides.
Yeah this seems like the specification/implementation problem. One can perfectly implement a bad spec, but coming up with the perfect spec is a human problem.
I don't think there is any technical solution for that, apart perhaps from asking an LLM, or multiple different LLMs, whether it thinks the formal definition matches the informal statement.
When mathematicians check normal (informal) proofs they usually verify both things at once: that the proof doesn't contain mistakes or insufficiencies, and that it proves what the author says it proves. Formal proof checkers aren't able to do the latter.
Joking of course, but only because Lojban doesn’t have formally specified semantics.
Computational semantics could, in principle, allow you to enumerate the possible logical meanings of a given informal English phrasing of the statement, and these can be either each proven or thrown out with explicit justification for why that was not what you meant.
Nope! Not a troll question. At the end of the day, we're human beings interacting with a bunch of electrified rocks. There will always be a layer of subjectivity and expertise involved in translating informal (or perhaps even unknown) desires into formal specs. It gets even worse when you are designing software that needs to take into account the underlying behavior of hardware; you wouldn't program an autopilot for a jumbo jet whose certification only exists in the context of some simplified formal model for how computers work, do you?
But I'm catastrophizing. If you think going through the definitions of a spec are painstaking and require expert insight to get right, it is miniscule compared to e.g. auditing every line of a codebase written in an imperative language. And not only that, to even get close to the same correctness guarantees of e.g. a formally verified pipeline through CompCert, you would have to audit every line of the codebases of every compiler that touches your code (https://r6.ca/blog/20200929T023701Z.html "It Is Never a Compiler Bug Until It Is"), together with analysis of the underlying hardware models.
All this to say, technology is more complicated than traditional programmers exposed only to high-level languages might be led to believe. That complication can only be reduced, consolidated, or blackboxed so much.
>Lean doesn't solve the problem of verifying that a formal theorem actually states what we think it states rather than something else.
With all honesty, if a tool existed that did that we would have solved maths. Or at least we would have solved human communication. In both cases it would be the most studied thing in the world, so I don't know where this belief would come from
Is there a way to read lean proofs noninteractively.
After playing with the natural number a game a bit, proofs quickly ended up being opaque sequences of "rw [x]" commands which felt unreadable. It's nice that the editor allows interactively viewing the state at different points, but having to click on each line ruins the flow of reading. Imagine if you had to read python code which has no indentation, braces or anything similar and only way to know where if statement ends or an else block starts is by clicking on each line.
My impression might be influenced by the limited vocabulary that the early levels of natural number game provides. Does the richer toolset provided by full lean make it easier to make proofs readable without requiring you to click on each line to get the necessary context?
Rocq used to have a "mathematical proof language". It's hard to find examples, but this shows the flavor (https://stackoverflow.com/a/40739190):
Lemma foo:
forall b: bool, b = true -> (if b then 0 else 1) = 0.
proof.
let b : bool.
per cases on b.
suppose it is true. thus thesis.
suppose it is false. thus thesis.
end cases.
end proof.
Qed.
So the idea was to make proofs read more like "manual" proofs as you would find them in math papers. Apparently nobody used this, so it was removed.
lemma "map f xs = map f ys ==> length xs = length ys"
proof (induct ys arbitrary: xs)
case Nil thus ?case by simp
next
case (Cons y ys) note Asm = Cons
show ?case
proof (cases xs)
case Nil
hence False using Asm(2) by simp
thus ?thesis ..
next
case (Cons x xs’)
with Asm(2) have "map f xs’ = map f ys" by simp
from Asm(1)[OF this] ‘xs = x#xs’‘ show ?thesis by simp
qed
qed
You spell out the structure and intermediate results you want, and the "by ..." blocks allow you to specify concrete tactics for parts where the detains don't matter and the tactic does what you want. In this proof, "simp" (a kind of simplify/reflexivity tactic) is enough for all the intermediate steps.
I don't know if there is anything like this for Lean, but maybe this provides some keywords for a web search. Or inspiration for a post in some Lean forum.
You can structure your proofs in Lean to reason forward, but proofs in Mathlib still primarily do backward proof, because it is more compact and performant.
This is a great question! I’m still not very experienced so take it with a grain of salt. But here’s my take on it.
I’ve been spending time with Lean quite a bit for the past few months. When I look at a proof, I don’t “read” it the same way as I do with programming. It feels a bit more like “scanning”. What stands out is the overall structure of the argument, what tactics are used, and what lemmas are used.
In real Lean code, the accepted style is to indent any new goals, work on one goal at a time (except a few special parallel tactics), and outdent when the goal is done. That’s what I mean by the “shape” of the argument. See some examples in this PR I’m working on: https://github.com/gaearon/analysis-solutions/pull/7/files
Once you’re familiar with what tactics do, you can infer a lot more. For example `intro` steps into quantifiers and “eats” assumptions, if I see `constructor` I know it’s breaking apart the goal into multiple, and so on.
Keep in mind that in reality all tactics do is aid you in producing a tree of terms. As in, actually the proofs are all tree-shaped. It’s even possible to write them that way directly. Tactics are more like a set of macros and DSL for writing that tree very concisely. So when I look at some tactics (constructor, use, refine, intro), I really see tree manipulation (“we’re splitting pieces, we’re filling in this part before that part, etc”).
However, it’s still different from reading code in the sense that I’d need to click into the middle to know for sure what assertion a line is dealing with. How much this is a problem I’m not sure.
In a well-written proof with a good idea, you can often retrace the shape of the argument by reading because the flow of thought is similar to a paper proof. So someone who wants to communicate what they’re doing is generally able to by choosing reasonable names, clear flow of the argument, appropriate tactics and — importantly! — extracting smaller lemmas for non-obvious results. Or even inline expressions that state hypotheses clearly before supplying a few lines of proof. On the other hand, there are cases where the proof is obvious to a human but the machine struggles and you have to write some boilerplate to grind through it. Those are sometimes solved by more powerful tactics, but sometimes you just have to write the whole thing. And in that case I don’t necessarily aim for it being “understandable” but for it being short. Lean users call that golfing. Golfed code has a specific flavor to it. Again in my experience golfing is used when a part of the proof would be obvious on paper (so a mathematician wouldn’t want to see 30 lines of code dedicated to that part), or when Lean itself is uniquely able to cut through some tedious argument.
So to summarize, I feel like a lot of it is implicit, there are techniques to make it more explicit when the author wants, it doesn’t matter as much as I expected it would, and generally as your mental model of tactics improves, you’ll be able to read Lean code more fluidly without clicking. Also, often all you need to understand the argument is to overview the names of the lemmas it depends on. Whereas the specific order doesn’t matter because there’s a dozen way to restructure it without changing the substance.
I love the article. Few can cross the bridge and describe these sorts of things in an easy to digest way. The secret is showing all the tiny steps experts might not see as it is too obvious. Thank you!
I wonder suppose you are not relying on any tricks or funky shenanigans. is it possible to just throw random stuff at Lean and find interesting observations based if it approves? like use an automated system or an llm that tries all types of wild proofs/theories? and sees if it works? maybe im asking wrong questions though or not stating it well.. this is above my understanding, i could barely get my head around prolog.
As someone who does certified programming professionally, I believe generative AI and formal methods are a match made in heaven. I might even go so far as to wager that the notion that human programmers will or will not be replaced by LLM-based AI is entirely dependent on whether these AI can get good at certified programming + compositional reasoning.
> is it possible to just throw random stuff at Lean and find interesting observations based if it approves?
Traditional AI has an easy time with checkers, because the search space is small. Chess is a little harder. Go still cannot be reasonably tackled by non-machine learning AI. Meanwhile, the search space (number of available moves together with the diversity of explorable states) for a sophisticated formal language becomes unfathomably large.
When the problem is known to be of a certain nature, often you can time-efficiently brute force it via SMT solvers. Traditionally SMT solvers and proof assistants have been separate branches of formal methods, but they're finally learning to play off each other's strengths (cf. Sledgehammer, Lean-SMT).
> like use an automated system or an llm that tries all types of wild proofs/theories? and sees if it works?
Research in this vein needs to be made more mainstream. I'm excited, though, that there have been big funders behind these ideas for years now, even before LLMs became big. Cf. "Learning to Find Proofs and Theorems by
Learning to Refine Search Strategies" for earlier work, or DeepSeek-Prover for more recent attempts. I'm no machine learning expert, but it seems it's still a very open question how to best train these things and what their future potential is.
All in all mainstream LLMs are still rather mediocre at languages like Rocq and Lean. And when they're wrong, their proof scripts are extremely tedious to try to troubleshoot and correct. But I have hope AI tooling in formal methods will mature greatly over time.
Does Lean have some sort of verification mode for untrusted proofs that guarantees that a given proof certainly does not use any "sorry" (however indirectly), and does not add to the "proving power" of some separately given fixed set of axioms with further axioms or definitions?
Does `#print axioms some_theorem` mentioned at the end of the article qualify? This would show if it depends on `sorry`, even transitively, or on some axioms you haven't vetted.
Yes, you can `print axioms` to make sure no axioms were added, make sure it compiles with no warnings or errors. There’s also a SafeVerify utility that checks more thoroughly and catches some tricks that RL systems have found
That's Lean 3, from eight years ago, and it's from before 'sorry' really existed in the way we know it now.
---
To answer the GP's question: Not only is there a verification mode, but Lean generates object files with the fully elaborated definitions and theorems. These can be rechecked by the kernel, or by external verifiers. There's no need to trust the Lean system itself, except to make sure that the theorem statements actually correspond to what we think they're supposed to be.
Perhaps this thread is a good place to ask, could anyone contribute their own opinion about the relative future of lean vs. idris/coq/agda? I want to dedicate some time to this from a knowledge representation point of view, but I'm unsure about which of them will have less risk of ending up as so many esoteric languages... I sank a lot of time on clojure core.logic for a related project and got burnt with the low interest / small community issue already, so I've been hesitant to start with any of them for some time.
IME Lean and Coq/Rocq are used more in practice, and have bigger libraries and communities, than Idris and Agda.
Rocq is the most common for program verification, but I suspect mainly due to it being older, and it has weird quirks due to its age, so Lean may catch up. Lean is the most common for proving mathematical theorems.
Big projects verified in Rocq include CompCert, CertiCoq, and sel4. Additionally, some large companies use Rocq to verify critical software like in airplanes (there's a list at https://github.com/ligurio/practical-fm although it may not be accurate). Big projects in Lean include mathlib (collection of various mathematical proofs), and the ongoing work to prove Fermat's Last Theorem (https://imperialcollegelondon.github.io/FLT/) and PFR (https://teorth.github.io/pfr/). I'm not aware of "real-world" projects in Idris and Agda but may be wrong.
That said, they're all small communities compared to something like C++ or JavaScript. Moreover, verifying programs is very slow and tedious (relative to writing them), so I wouldn't be surprised if we see a big breakthrough (perhaps with AI) that fundamentally changes the landscape. But remember that even with a breakthrough your skills may be transferable.
I wouldn't put much money on any of them. IME most mathematicians aren't that interested in formalization, and the gulf between a hand written proof and and a computer verified syntax is pretty huge.
Theyre interesting to learn and play with for their own sake, but I'd be reluctant to make any bets on the future of any of them.
If I had to choose, Lean seems to have the most momentum, though the others have been around longer and each has some loyal users.
Most mathematicians aren't doing formalization themselves, but my impression is that a lot of them are watching with interest. I get asked "is my job secure?" quite a lot nowadays. Answer is "currently yes".
Maybe before we have AGI we should get an AI that can translate Andrew’s proof into Lean for us. Easily tractable, checkable, useful, and build-upon-able
Kevin Buzzard is reportedly working on formalizing a modern proof of FLT. This effort has already managed to surface some unsound arguments in one of the prereqs for the proof (namely crystalline cohomology) https://xenaproject.wordpress.com/2024/12/11/fermats-last-th... though the issue has since been fixed.
I understand that the Mizar community is rather closed and primarily focused on extending the Mizar Mathematical Library. The Mizar proof checker is closed source.
Lean is gaining traction, which can be seen from the fact that at the moment 81 [1] of the 100 theorems of the 'Formalizing 100 Theorems' [1] have been proven in Lean, while Mizar stands at 71 [3]
None at all. Hit the natural numbers game that was referenced and you can start proving basic things like 2+2=4, n+0=n on up to associativity, commutativity, etc of basic operations, etc.
As the article explains (and the title alludes to), if your axioms are inconsistent,
so you can prove a contradiction, you can prove anything. Literally any statement that can be formulated can be proven to be true (as can its negation!). So you'd be trying to model the universe with a maths in which 1 = 2. And also 1 = 3. And also 1 =/= 3.
Any useful set of axioms has to be consistent, because any inconsistency is catastrophic. It wrecks the entire thing. Of course that also means, thanks to Godel's theorem, that any useful set of axioms is incomplete, in that there are true statements that cannot be proved from them.
I guess let me rephrase the question. Does there exist system that is inconsistent but the inconsistency is localized and therefore we can actually miss the inconsistency by not knowing or noticing the localized inconsistency? Is there a proof of what you said as well? An inconsistent system with the inconsistency in one spot can have any statement made about the system be both true and false?
I’m adjusting the parameters here so we can ask the deeper question of whether one exists in the real world.
I possess a proof of FLT, and will publish the metamath formalization, in due time (first I need to build a secure display, for reasons that will become clear).
people jest, but they will jest less once they think a little more critically: how can cryptography experts expound the security of such and such cryptographic primitives involving discrete numbers, exponentiation, ... but be unable to come up with alternative proofs of FLT? ... but be unable to individually formalize existing supposed proof of FLT? ... but be unable to find a more succinct proof of FLT? who knows their way around the block, if the consensus is that Fermat did not possess a proof but Wiles did finally find a proof, who knows their way around the block, if a much more succinct proof than Wiles long-winded-anc-to-this-date-not-formally-verified-proof?
I’ve been kicking around the idea of something like Lean (maybe even just Lean?) to rewrite news and other non-fiction articles, treating statements as theorems that need to be proven. Proofs could include citations, and could be compound things like “this is a fact if three of my approved sources asserted it as a fact”
It should then be possible to get a marked-up version of any document with highlighting for “proven” claims.
Sure, it’s not perfect, but I think it would be an interesting way to apply the rigor that used to be the job of publications.
Formalising natural language statements is a minefield of difficulties, for (imo) essentially the same reasons that writing code which interacts with the real world is so difficult. Concepts you take for granted like identity, time, causality... all of that stuff needs to be fleshed out carefully and precisely in the formalism for facts to be relatable to each other (or even expressible in the first place).
Not to discourage you - it's a cool problem! OpenCog comes to mind as a project that tried to take this all the way, and there's a field devoted to this stuff in academia called KRR (knowledge representation and reasoning). The IJCAI journal is full of research on similar topics.
(also see all the different logics philosophers use to formalise different kinds of argument involving time/modality/probability - there are a lot, but they aren't very "modular" and can't easily be mixed unless someone has solved that since I last looked into it)
> Concepts you take for granted like identity, time, causality... all of that stuff needs to be fleshed out carefully and precisely in the formalism for facts to be relatable to each other (or even expressible in the first place).
Yes, and different formalisations of identity apply in different contexts.
Eg remember the famous line about not being able to step in the same river twice.
> logics philosophers use .. aren't very "modular" and can't easily be mixed
Not sure if the model-checking communities would agree with you there. For example CTL-star [0] mixes tree-logic and linear-temporal, then PCTL adds probability on top. Knowledge, belief, and strategy-logics are also mixed pretty freely in at least some model checkers. Using mixed combinations of different-flavored logic does seem to be going OK in practice, but I guess this works best when those diverse logics can all be reduced towards the same primitive data structures that you want to actually crunch (like binary decision diagrams, or whatever).
If no primitive/fast/generic structure can really be shared between logics, then you may be stuck with some irreconcilable continuous-vs-discrete or deterministic-vs-probabilistic disconnect, and then require multiple model-checkers for different pieces of one problem. So even if mixing different flavors of logics is already routine.. there's lots of improvements to hope for if practically everything can be directly represented in one place like lean. Just like mathematicians don't worry much about switching back and forth from geometry/algebra, less friction between representations would be great.
Speaking of CTL, shout out to Emerson[1], who won a Turing award. If he hadn't died recently, I think he'd be surprised to hear anyone suggest he was a philosopher instead of a computer scientist ;)
[0]: https://en.wikipedia.org/wiki/CTL* [1]: https://en.wikipedia.org/wiki/E._Allen_Emerson
1 reply →
> (also see all the different logics philosophers use to formalise different kinds of argument involving time/modality/probability - there are a lot, but they aren't very "modular" and can't easily be mixed unless someone has solved that since I last looked into it)
From a logic-as-types perspective, modalities turn out to be monads. So the problem of "mixing" modalities is quite similar to the problem of composing monads in programming languages.
1 reply →
I think it's quite rare that the most important beliefs you should derive from the news can be justified by a collection of absolute statements.
I think you'd be better served by a tool for calculating chains of Bayesian reasoning. I've seen a tool that does this for numerical estimations.
Correct. The only way to not subject oneself to (economic) irrationality is to model your beliefs about the world using probabilities and using Bayes to update in light of new evidence.
Careful, such an approach could easily end up giving an aura of logical objectivity to arbitrarily radical and nonsensical ideas. The political views of one of the fathers of modern logic may serve as a cautionary tale [0].
[0] https://en.m.wikipedia.org/wiki/Gottlob_Frege#Political_view...
> I’ve been kicking around the idea of something like Lean (maybe even just Lean?) to rewrite news and other non-fiction articles, treating statements as theorems that need to be proven.
I found I got much better at writing non-fiction after having math at uni. I would help proof-read essays and other hand-ins by my SO and sister, and apply similar rigor as you mention. Stuff like "you show C follows from B here, but you haven't actually given an argument for why B follows A, so you can't then claim C follows from A".
It's tempting to say that with LLMs this seems like a plausible task to turn it into a program, but the hallucination issue puts a damper on that scheme.
There are rhetorical tricks which rely on this to be persuasive. You can say "Thing X is happening, so we should do Thing Y", and people will nod.
If you're sneaky about it it will read like a logical conclusion when in fact X and Y are only loosely related contextually, and there is no logical chain at all.
A standard political trick is to blame X on something emotive and irrelevant, and offer Y as a false solution which distracts from the real causes of the problem.
This is used so often it's become a core driver of policy across multiple domains.
Although it's very effective, it's a crude way to use this. There are more subtle ways - like using X to insinuate criticism of a target when Y is already self-evident.
Point being, a lot of persuasive non-fiction, especially in politics, law, religion, and marketing, uses tricks like these. And many others.
They work because they work in the domain of narrative logic - persuading through stories and parables with embedded emotional triggers and credible-sounding but fake explanations, where the bar of "That sounds plausible" is very low.
LLMs already know some of this. You can ask ChatGPT to make any text more persuasive, and it will give you some ideas. You can also ask it to read a text, pull out the rhetorical tricks, and find the logical flaws.
It won't do as good a job as someone who uses rhetoric for a living. But it will do a far better job than the average reader, who is completely unaware of rhetoric.
A mild damper at best, RAG-based pipelines are mature now.
Alas, things like this aren't logic proofs.
It bothers me to my core when I see this idea. Sometimes at FAANG. Blissfully management learned to...not promote...them.
1 reply →
I think for that kind of thing, Prolog would be sufficient. Lean is more powerful, but it requires you to supply proofs - Prolog can just automatically run inferences.
(Although I wouldn't be surprised if somebody had already recreated something like Prolog in Lean with a tactic that does more or less what the Prolog interpreter does)
I think it'd be more interesting to map out entire trees of arguments about a topic. Start with something big, like "is there a God," then come up with all of the arguments for or against, then come up with all of the arguments against those arguments, then the counters to those arguments. Really explore it as a sort of debate space. Then bring in citations not as backing but for historical context: "Plato made this argument in such and such." The idea wouldn't be so much to decide a winner as to prevent us from going around in circles by making a map.
> Start with something big, like "is there a God,"
To do that you first need a definition of the concept God.
And then you realize that all the people making arguments in the past were using their own unspoken and incompatible definitions.
Kialo does this for online debates: https://www.kialo.com/
An example tree for the statement "God exists": https://www.kialo.com/god-exists-3491?path=3491.0~3491.1
As a programmer who has studied philosophy: This is the approach I take more or less in my head when reading articles, however the problem is the ambiguity of natural language.
E.g. lets say the statement is about a presidents promise to end a conflict within 24 hours after coming into office. One could get to a conclusion pretty quickly when the conflict hasn't been ended after 24 hours when they entered the office.
But what does "end the conflict" mean exactly? If the conflict ended how long does it need to remain ended to achieve the label "ended"? What if said president has a history that recontextualizes the meaning of that seemingly simple claim because he is known to define the word "ended" a little different than the rest of us, do you now judge by his or by our definition? What if the conflict is ended but there is a small nest of conflict remaining, after which size do we consider the conflict going on?
I know some of that has official definitions, but not everything has. In the end a lot of that will require interpretation and which definition to chose. But yeah I support your idea, just spelling it out and having a machine-readable chain of thought might help already.
Bah, the answer is easy: words should be used with their most common definition, everything else is a lie. If you want to use a word 'differently' then you need to do so explicitly.
Proof != evidence. In evidence, we corroborate, collate, add more sources, weigh evidence, judge. Proof is a totally different process. Only in the mathematical do one prove something, everywhere else we build up evidence, corroborate, etc.
Check out https://argdown.org/ for the much simpler version of that idea. No proofs, just supporting/conflicting evidence and arguments. But that's hard enough to consistently produce as it is.
News and non-fiction articles are not math and cannot be treated as math. At best you could build a tool that check the most glaring contradictions (like a typo changing a number), and I'm not even sure it can be consistent without building a software that understand language. At worse you would build a tool that spit bullshit that millions of people would treat as gospel
Claimify from MS research aims in this direction. There's a paper and video explainer from a few months ago.
https://www.microsoft.com/en-us/research/blog/claimify-extra...
For the form, you might be interested in Ethica, by Spinoza [1]. On the other hand, for fact checking, the key concept seems to be trust in sources rather than logical consistency.
[1]: https://en.wikipedia.org/wiki/Spinoza%27s_Ethics
This might be interesting to you: https://en.m.wikipedia.org/wiki/Stephen_Toulmin#The_Toulmin_...
I think you could just use lean, and progressively elaborate the inference graph.
But you might find a historical/journalistic/investigative angle more profitable.
I heard a software product pitch from a small law practice who were outside investigators for abuse/policy-violation claims in a unionized workforce. They had a solid procedure: start with the main event. Interview all witnesses, and identify other interesting events. Then cross-reference all witnesses against all alleged/interesting events. That gives you a full matrix to identify who is a reliable witness, what sort of motivations are in play, and what stories are being told, patterns of behaviour, etc. Their final report might actually focus on another event entirely that had better evidence around it.
In that sort of historical investigation, a single obviously false claim undermines all claims from that witness. Pure logic doesn't have that bi-directionality of inference. Sometimes your sources just flat out lie! That's half the work.
The key difference is between a full matrix of claims vs sources/evidence, while good math is all about carefully plotting a sparse inference graph that traverses the space. What is called "penetrating minimalism" in math is called "cherry picking" in history or journalism.
We passed on the pitch. Didn't see how it scaled to a product worth building, but their process was appealing. Much better than the an internal popularity-contest or witch-hunt. Probably better than the average police investigation too.
Look at Arguman, now defunct, but you can probably find some good videos/screenshots.
Basically crowd-sourced statements, rebuttals, agreements, etc, in a nice interface.
We need that nowadays.
There's Kialo: https://www.kialo.com/
I think a more practical method would be to trace the provenance of a claim, in a way that lets you evaluate how likely it is for the person making it to actually know the facts.
A simple scenario would be reading a news article about american politics. You see an unusual claim made, so you start tracing it and find that the journalist got it from person X who got it from person Y who got it from donald trump. Trump is famous for lying constantly, so unless there's a second source making the claim, you could disregard it with a high probability.
What about proven facts that get disproven? Is there room to rethink your priors?
I don't know why you're getting downvoted because I think this is an interesting idea.
Completely impossible but still interesting and fun to explore.
But you don't need the power of Lean to do something like this. I would recommend starting with something like Prolog or RDF triples
Basically [Citation Needed]
[flagged]
12 replies →
[dead]
Wait… so are we basically compiling a dictionary of proofs all stemming from a handful of self-evident truths? And every further proof is just some logical aggregation of previous proofs?
Can someone please turn this into a Zachtronics style game?! I badly, direly want this. There’s a game called Euclidea that’s kind of like this for trigonometry and the whole premise of building a tower of logic is deeply attractive to me.
Is this what pure math is about? Is this what people who become math profs are feeling in their soul? The sheer thrill of adding to that dictionary of proofs?
Aside: I recall some famous mathematician had made a list of base proofs that you just hold to be true. Can someone remind me who, and/or what that list is called? I’m guessing they’re considered axioms.
There is a game already, though it may not be exactly what you want (and the goal is definitely not "generate all known math"). I've played around with it, and I do think it's pretty fun.
The article mentions it, in fact: https://adam.math.hhu.de/#/g/leanprover-community/nng4
How did I miss this! Thanks for pointing it out. This is scratching the itch.
Had a blast with that game, really scratched that puzzle itch, and got to relearn some long-forgotten knowledge.
I played through a lot of it, and while it was fun, I wouldn't call it gamified in the sense that Zachtronics does it. As an FYI, here are some sticking points I ran into as well:
1) It uses the term "goal" for the the current stage of the proof you're working on, which is counterintuitive. In common speech, the goal is the theorem you're trying to prove, not the current step you're on in proving it. It would be like if I were helping you get to work in the morning, and I said, "Okay, you're current [sub]goal is that you're done brushing your teeth." No. That's not a goal. That's progress toward a goal.
2) It doesn't warn you early on about the left-associativeness of the operator binding, or what to do about it. So you can apply what seem like valid transformations but then parentheses pop up that you don't expect, and which you have no guidance on how to deal with. I had to ask ChatGPT what was going on to solve it.
2 replies →
> Can someone please turn this into a Zachtronics style game?!
That game is called math :) Partially joking, but I do think a game version would be fun.
> Is this what pure math is about?
More or less yes for an undergrad, but when you get to research it feels different.
> I badly, direly want this
Consider checking out an abstract algebra book. Maybe Dummit and Foote or something else popular. Proofs in algebra often have a satisfying game-like quality. The more popular books will have solutions you can find online if you get stuck.
>More or less yes for an undergrad, but when you get to research it feels different.
Would you mind telling more about how it feels different in research?
2 replies →
> Proofs in algebra often have a satisfying game-like quality.
Interesting. I find them banal and deeply unsatisfying.
1 reply →
You might be thinking of Euclid's axioms, which defines points, lines, planes, etc, and that lines can be parallel. This is an interesting one because the system is violated if your space is not flat, like if you're on a sphere.
You could also be thinking of Zermelo-Fraenknel set theory (ZF/ZFC), which most of modern mathematics is ultimately based upon.
This highlights I think what is an ultimate pitfall of something like Lean. It is something like this: when an assumption is broken or violated in mathematics, the whole field of mathematics can _grow_. Non-euclidean geometry, imaginary numbers, etc are all examples of this. Trying to cram maths into a programming language sounds like it would constrain it, removing the creativity and reducing all maths to a search space.
6 replies →
Check out the game Bombe [1]. It's a minesweeper variant where instead of directly flagging or uncovering cells, you define rules for when cells can be flagged. As it gets more advanced you end up building lemmas that implicitly chain off each other. Then as _you_ get more advanced (and the game removes some arbitrary restrictions around your toolset) you can generalize your rules and golf down what you've already constructed.
[1] https://steamcommunity.com/app/2262930
In my mind this is literally what math is. We start with axioms, and derive conclusions. There's probably more to it than that, but that's the understanding I'm at now.
But shouldn't it also be part of the axioms what are the rules that allow you to derive new theorems from them?
So then you could self-apply it and start ... deriving new rules of how you can derive new theorems and thus also new rules, from axioms?
I'm jusr confused a bit about "axioms" and "rules". What's the difference?
2 replies →
Choosing the axioms is difficult.
2 replies →
> so are we basically compiling a dictionary of proofs all stemming from a handful of self-evident truths
I would say, "from a handful of axioms".
It's certainly true that when Euclid started this whole enterprise, it was thought thax axioms should be self-evident. But then, many centuries later, people discovered that there are other interesting geometries that don't satisfy the same axioms.
And when you get to reasoning about infinities, it's very unclear that anything about them can be considered self-evident (a few mathematicians even refuse to work with infinities, although it's definitely a very niche subcommunity).
Some of today's common axioms are indeed self-evident (such as "you should be able to substitute equal subterms"), but things like the axiom of choice have (at least historically) been much more controversial. I would probably say that such axioms can be considered "plausible" and that they generally allow us to be able to prove what we want to prove. But you'll definitely find mathematicians championing different axioms.
> Aside: I recall some famous mathematician had made a list of base proofs that you just hold to be true. Can someone remind me who, and/or what that list is called? I’m guessing they’re considered axioms.
That would be the ZFC axioms. It was originally the ZF axioms (named so after the mathematicians Zermelo and Fraenkel who worked in the early 20th century), and then later the Axiom of Choice (C) was added. It's generally considered to be the "standard" set of axioms for maths, although very few mathematicians actually work directly from these axioms. But in theory, you can take almost every mathematical proof (unless it's explicitly set in some other foundation) and recast it entirely in applications of the ZFC axioms.
Thanks for this. So I have half a thought I’m trying to flesh out from what you’ve shared. Bear with me, whoever reads this:
Are there essentially two flavours:
- the maths based on axioms that are fundamentally, cosmically true such as x = x. And in doing so we’re formalizing on paper the universal truths and all subsequent rules we know to be true given these
- the maths that incorporate those plus additional axioms that aren’t necessarily fundamentally true (or maybe just not provable), but work very well at laying a foundation for further rules that build a practically useful, but not necessarily “correct” toolbox
With the latter, is it kind of a “they won’t hold up under normal conditions but if you accept these axioms, there’s interesting things you can explore and achieve?”
3 replies →
> Aside: I recall some famous mathematician had made a list of base proofs that you just hold to be true. Can someone remind me who, and/or what that list is called? I’m guessing they’re considered axioms.
Take a look at zeroth-order logic.
Some good mentions elsewhere in the thread, another to check out is The Incredible Proof Machine https://incredible.pm/
Fun fact: the author of the The Incredible Proof Machine (Joachim Breitner) also works on Lean :-)
Check out Terence Tao’s book called Analysis. It is sometimes challenging but it opened that world for me.
You probably are vaguely referencing the Principa Mathematica.
That's a very outdated text that is mostly irrelevant for modern maths except for its historical importance.
How are you asking questions like this and yet you're a roboticist and a principal software engineer in the autonomous mobile robotics industry.
From your questions I marked you as a high schooler.
Probably similar to how numerous maths PhDs I’ve worked with are terrible at shipping durable production code on time and on budget, or how many doctors are terrible at inserting IVs, or how an architect may likely be terrible at framing a house, or how some Grammy winning musicians can't read classical notation: your mental model for domains of expertise is probably wrong.
One problem I have with learning Lean is that tactics - like rfl in the example - are overloaded, and their full semantics not completely explained/understandable from the tutorials. Unlike, say, C programming where one may understand what happens to the program state down to the bit, it feels too fuzzy. And the rewrite (rw) tactic syntax doesn't feel natural either.
Yeah, I've similarly found the tactics in Coq (now Rocq) difficult to internalize. E.g., I might have "A = B" and "P(A,A)" available, and I want to conclude "P(A,B)", but the rewrite will fail for some arcane reason. (Issues with the definition of some of the intermediate structures, I'd imagine.)
On the other end of the spectrum, I've recently been playing with Metamath and its set.mm database, which has no programmable tactics at all, only concrete inferences that can be used in a proof. (E.g., the modus ponens inference ax-mp says that "|- ph" and "|- ( ph -> ps )" prove "|- ps", where "ph" and "ps" are variables that can be substituted.) Alas, it's not much better, since now you have to memorize all the utility lemmas you might need!
Agreed - the Metamath base language (and its verifier) seem to be the most tractable of all I've seen, although it is probably still quite far away from the complexity of the high level language(s) that compile to it.
Derived from it, the currently best attempt to achieve an unambiguous and secure language seems to be Metamath Zero: https://github.com/digama0/mm0
This is one of the reasons I prefer Agda. It is usually written without tactics, you just write the proof term in a functional programming language via the Curry–Howard correspondence. The trade off is that you must be more disciplined with creating useful abstractions and functions, otherwise proving even routine stuff becomes really tedious.
I'm not sure you prefer Agda so much as you prefer providing proof terms functionally rather than imperatively. Then again I've never used Agda; how does it differ from Coq/Rocq minus Ltac or Lean minus tactics?
1 reply →
At least you can 'go to definition' on the tactics and see what they're doing. It's a lot to take in at the beginning, but it can all be inspected and understood. (At least until you get to the fundamental type theory; the reduction rules are a lot harder to get into.)
> the rewrite (rw) tactic syntax doesn't feel natural either.
Do you have any thoughts on what a natural rewrite syntax would be?
> Do you have any thoughts on what a natural rewrite syntax would be?
Not yet, but I'd probably prefer something that more explicitly indicated (in English, or some sort of more visually "pointing" indicator) which specific parts of the previous step would be replaced
It feels weird, or I'd have to get used to that both what is being replaced and what it is replaced with depends on some distant context, it's very indirect as it requires switching attention between the tactics tree and the context or previous proofs
3 replies →
This is why I prefer Agda, where everything comes down to pattern matching.
You can absolutely use pattern matching in Lean instead of tactics, if you prefer to write the proof that is closer to "what is going on under the hood"
1 reply →
It was interesting to me as it didn't fit my expectations. As a math theory ignoramus, I expected that reflection and rewrite are more fundamental than addition. But Lean seems to assume addition but require explicit rfl and rewrite.
Perhaps there's a Lean "prelude" that does it for you.
Yes, there is a prelude that defines natural numbers and an addition function on them. As the post notes, the reflexivity tactic "unfolds" the addition, meaning that it applies the addition function to the constants it is given, to arrive at a constant. This is not specific to addition, it unfolds other function definitions too. So addition doesn't have to come first, you are right that reflexivity is more fundamental.
2 replies →
It's because the addition can evaluate to only one form: `3 + 3` and `6` both evaluate to `succ (succ (succ (succ (succ (succ zero)))))`, similarly Lean can infer `4 * 3 = 1 + 3 + 8` because both evaluate to the same as `12`. But an expression can be rewritten to an infinite number of forms, e.g. `x * 2` can be rewritten to `x + x`, `(x + (x / 2)) * 4/3)`, etc. So `refl` can automatically evaluate both sides but not rewrite them.
The surprising thing to me was that tactics are all written in "user-level" code, outside the proof kernel. This makes sense in the sense that you want a small, thoroughly tested kernel that doesn't change. But it also implies that if you use tactics in your proof, then your proof can go from correct to failing if one of the tactics you use gets modified between releases. Is that a problem in real world use?
Yes. I've seen proofs fail after a mathlib update because the behaviour of simp changed. I've never seen it do less after an update (so far), but sometimes it'll simplify more and then the next steps may fail to work.
I've since adopted the suggestion (that is afaik used in mathlib) to never use bare simp unless it closes the current goal directly. Instead, if you write simp?, Lean will run the simplifier and tell you exactly which theorems it used (in the form of simp only [...]) which you can then insert into the proof instead.
1 reply →
One thing Lean inherits from mathematics is the use of opaque notation. It would be nice to inherit some readability from programming.
Paperproof is an effort in one direction (visualization of the logic tree)
https://paperproof.brick.do/lean-coq-isabel-and-their-proof-...
Yeah the documentation is also quite fragmented because tactics are user-definable and some are from Lean vs from Mathlib etc. I've gotten quite decent at using the basic ones but I still sometimes ask on Zulip if something isn't working as I expected.
You may like Agda. I prefer Lean even though you are right about this.
I understand what you mean, but I shudder to think at how complicated it would be to write certain proofs without tactics, e.g. simp, ring, omega, linarith and friends all really do a lot of work.
Nitpick, but it's a bit strange to say that the two_eq_two theorem looks like a function. It looks more like a constant, since it has no arguments. (Yes I know that constants are nullary functions.)
I would find the following a more convincing presentation:
Here x_eq_x looks like a function, and in 2_eq_2's proof we apply it like a function.
Fair! I decided not to do that because the way arguments work (and dependent types in general — like being able to return a proof of `x = x` given x) is unusual for people like me, and deserves an article of its own. So I’m kicking that to one of the next articles.
One thing I didn't know until recently was that Lean doesn't solve the problem of verifying that a formal theorem actually states what we think it states rather than something else.
In some cases that's not an issue, because the formal statement of the theorem is very simple. E.g. for Fermat's Last Theorem, which can be formally expressed with "(x y z : ℕ+) (n : ℕ) (hn : n > 2) : x^n + y^n ≠ z^n". But in other cases it might be much harder to verify that a formal statement actually matches the intuitive informal statement we want.
Of course, Lean or similar languages still offer the huge potential of verifying that a formal statement is provable, even if they don't help with verifying that the formal statement matches the intended informal statement in natural language.
Are there, in general, ways to do that? Sorry if this sounds like a troll question, but I struggle to think of how one would verify that a formal statement matches the accepted understanding other than painstakingly going through the definitions with experts from both sides.
Yeah this seems like the specification/implementation problem. One can perfectly implement a bad spec, but coming up with the perfect spec is a human problem.
1 reply →
I don't think there is any technical solution for that, apart perhaps from asking an LLM, or multiple different LLMs, whether it thinks the formal definition matches the informal statement.
When mathematicians check normal (informal) proofs they usually verify both things at once: that the proof doesn't contain mistakes or insufficiencies, and that it proves what the author says it proves. Formal proof checkers aren't able to do the latter.
3 replies →
Yeah, by adopting Lojban as our common language.
Joking of course, but only because Lojban doesn’t have formally specified semantics.
Computational semantics could, in principle, allow you to enumerate the possible logical meanings of a given informal English phrasing of the statement, and these can be either each proven or thrown out with explicit justification for why that was not what you meant.
Nope! Not a troll question. At the end of the day, we're human beings interacting with a bunch of electrified rocks. There will always be a layer of subjectivity and expertise involved in translating informal (or perhaps even unknown) desires into formal specs. It gets even worse when you are designing software that needs to take into account the underlying behavior of hardware; you wouldn't program an autopilot for a jumbo jet whose certification only exists in the context of some simplified formal model for how computers work, do you?
But I'm catastrophizing. If you think going through the definitions of a spec are painstaking and require expert insight to get right, it is miniscule compared to e.g. auditing every line of a codebase written in an imperative language. And not only that, to even get close to the same correctness guarantees of e.g. a formally verified pipeline through CompCert, you would have to audit every line of the codebases of every compiler that touches your code (https://r6.ca/blog/20200929T023701Z.html "It Is Never a Compiler Bug Until It Is"), together with analysis of the underlying hardware models.
All this to say, technology is more complicated than traditional programmers exposed only to high-level languages might be led to believe. That complication can only be reduced, consolidated, or blackboxed so much.
2 replies →
>Lean doesn't solve the problem of verifying that a formal theorem actually states what we think it states rather than something else.
With all honesty, if a tool existed that did that we would have solved maths. Or at least we would have solved human communication. In both cases it would be the most studied thing in the world, so I don't know where this belief would come from
Is there a way to read lean proofs noninteractively.
After playing with the natural number a game a bit, proofs quickly ended up being opaque sequences of "rw [x]" commands which felt unreadable. It's nice that the editor allows interactively viewing the state at different points, but having to click on each line ruins the flow of reading. Imagine if you had to read python code which has no indentation, braces or anything similar and only way to know where if statement ends or an else block starts is by clicking on each line. My impression might be influenced by the limited vocabulary that the early levels of natural number game provides. Does the richer toolset provided by full lean make it easier to make proofs readable without requiring you to click on each line to get the necessary context?
> Is there a way to read lean proofs noninteractively.
I'd like to see an answer to this question. I was looking into it the other day.
I found this: https://xenaproject.wordpress.com/2019/02/11/lean-in-latex/ Which gives a way to do the clicking-through outside the editor. And maybe gives some insight into how Lean people see things.
Rocq used to have a "mathematical proof language". It's hard to find examples, but this shows the flavor (https://stackoverflow.com/a/40739190):
So the idea was to make proofs read more like "manual" proofs as you would find them in math papers. Apparently nobody used this, so it was removed.
Isabelle's Isar proof language is similar, and AFAIK the standard way of proving in Isabelle (example from https://courses.grainger.illinois.edu/cs576/sp2015/doc/isar-...):
You spell out the structure and intermediate results you want, and the "by ..." blocks allow you to specify concrete tactics for parts where the detains don't matter and the tactic does what you want. In this proof, "simp" (a kind of simplify/reflexivity tactic) is enough for all the intermediate steps.
I don't know if there is anything like this for Lean, but maybe this provides some keywords for a web search. Or inspiration for a post in some Lean forum.
You can structure your proofs in Lean to reason forward, but proofs in Mathlib still primarily do backward proof, because it is more compact and performant.
3 replies →
This is a great question! I’m still not very experienced so take it with a grain of salt. But here’s my take on it.
I’ve been spending time with Lean quite a bit for the past few months. When I look at a proof, I don’t “read” it the same way as I do with programming. It feels a bit more like “scanning”. What stands out is the overall structure of the argument, what tactics are used, and what lemmas are used.
In real Lean code, the accepted style is to indent any new goals, work on one goal at a time (except a few special parallel tactics), and outdent when the goal is done. That’s what I mean by the “shape” of the argument. See some examples in this PR I’m working on: https://github.com/gaearon/analysis-solutions/pull/7/files
Once you’re familiar with what tactics do, you can infer a lot more. For example `intro` steps into quantifiers and “eats” assumptions, if I see `constructor` I know it’s breaking apart the goal into multiple, and so on.
Keep in mind that in reality all tactics do is aid you in producing a tree of terms. As in, actually the proofs are all tree-shaped. It’s even possible to write them that way directly. Tactics are more like a set of macros and DSL for writing that tree very concisely. So when I look at some tactics (constructor, use, refine, intro), I really see tree manipulation (“we’re splitting pieces, we’re filling in this part before that part, etc”).
However, it’s still different from reading code in the sense that I’d need to click into the middle to know for sure what assertion a line is dealing with. How much this is a problem I’m not sure.
In a well-written proof with a good idea, you can often retrace the shape of the argument by reading because the flow of thought is similar to a paper proof. So someone who wants to communicate what they’re doing is generally able to by choosing reasonable names, clear flow of the argument, appropriate tactics and — importantly! — extracting smaller lemmas for non-obvious results. Or even inline expressions that state hypotheses clearly before supplying a few lines of proof. On the other hand, there are cases where the proof is obvious to a human but the machine struggles and you have to write some boilerplate to grind through it. Those are sometimes solved by more powerful tactics, but sometimes you just have to write the whole thing. And in that case I don’t necessarily aim for it being “understandable” but for it being short. Lean users call that golfing. Golfed code has a specific flavor to it. Again in my experience golfing is used when a part of the proof would be obvious on paper (so a mathematician wouldn’t want to see 30 lines of code dedicated to that part), or when Lean itself is uniquely able to cut through some tedious argument.
So to summarize, I feel like a lot of it is implicit, there are techniques to make it more explicit when the author wants, it doesn’t matter as much as I expected it would, and generally as your mental model of tactics improves, you’ll be able to read Lean code more fluidly without clicking. Also, often all you need to understand the argument is to overview the names of the lemmas it depends on. Whereas the specific order doesn’t matter because there’s a dozen way to restructure it without changing the substance.
I love the article. Few can cross the bridge and describe these sorts of things in an easy to digest way. The secret is showing all the tiny steps experts might not see as it is too obvious. Thank you!
Thanks!
I wonder suppose you are not relying on any tricks or funky shenanigans. is it possible to just throw random stuff at Lean and find interesting observations based if it approves? like use an automated system or an llm that tries all types of wild proofs/theories? and sees if it works? maybe im asking wrong questions though or not stating it well.. this is above my understanding, i could barely get my head around prolog.
As someone who does certified programming professionally, I believe generative AI and formal methods are a match made in heaven. I might even go so far as to wager that the notion that human programmers will or will not be replaced by LLM-based AI is entirely dependent on whether these AI can get good at certified programming + compositional reasoning.
> is it possible to just throw random stuff at Lean and find interesting observations based if it approves?
Traditional AI has an easy time with checkers, because the search space is small. Chess is a little harder. Go still cannot be reasonably tackled by non-machine learning AI. Meanwhile, the search space (number of available moves together with the diversity of explorable states) for a sophisticated formal language becomes unfathomably large.
When the problem is known to be of a certain nature, often you can time-efficiently brute force it via SMT solvers. Traditionally SMT solvers and proof assistants have been separate branches of formal methods, but they're finally learning to play off each other's strengths (cf. Sledgehammer, Lean-SMT).
> like use an automated system or an llm that tries all types of wild proofs/theories? and sees if it works?
Research in this vein needs to be made more mainstream. I'm excited, though, that there have been big funders behind these ideas for years now, even before LLMs became big. Cf. "Learning to Find Proofs and Theorems by Learning to Refine Search Strategies" for earlier work, or DeepSeek-Prover for more recent attempts. I'm no machine learning expert, but it seems it's still a very open question how to best train these things and what their future potential is.
All in all mainstream LLMs are still rather mediocre at languages like Rocq and Lean. And when they're wrong, their proof scripts are extremely tedious to try to troubleshoot and correct. But I have hope AI tooling in formal methods will mature greatly over time.
I'm already doing almost all my LLM-assisted programming in Haskell, all the devops in check-heavy Nix, and starting to move everything into Dhall.
There's no way that Python written by LLMs will survive contact with Haskell written by LLMs in a standup competition.
One imagines the scope for this sort of thing goes very high in formality.
5 replies →
This is an active area of research and experimentation!
Much of Lean community are on Zulip (which is kind of like a forum?) and you can see many relevant threads here: https://leanprover.zulipchat.com/#narrow/channel/219941-Mach...
as someone who hasn't seen Lean before but was curious from alphaproof, love the intro! curious if you can mention what you're working on in Lean?
For now I'm just learning math with it!
Currently I'm going through https://github.com/teorth/analysis (Tao's Lean companion to his textbook) and filling in the `sorry`s in the exercises (my solutions are in https://github.com/gaearon/analysis-solutions).
very cool. btw, I also love that "sorry" is the "any" equivalent in Lean
Does Lean have some sort of verification mode for untrusted proofs that guarantees that a given proof certainly does not use any "sorry" (however indirectly), and does not add to the "proving power" of some separately given fixed set of axioms with further axioms or definitions?
Does `#print axioms some_theorem` mentioned at the end of the article qualify? This would show if it depends on `sorry`, even transitively, or on some axioms you haven't vetted.
Oh, I missed that, thanks! It would be cool to use this to visualize the current state and progress on, and depth of the "proof dependency graph"!
Yes, you can `print axioms` to make sure no axioms were added, make sure it compiles with no warnings or errors. There’s also a SafeVerify utility that checks more thoroughly and catches some tricks that RL systems have found
Apparently this is possible with macros? I dunno: https://github.com/leanprover/lean3/issues/1355
That's Lean 3, from eight years ago, and it's from before 'sorry' really existed in the way we know it now.
---
To answer the GP's question: Not only is there a verification mode, but Lean generates object files with the fully elaborated definitions and theorems. These can be rechecked by the kernel, or by external verifiers. There's no need to trust the Lean system itself, except to make sure that the theorem statements actually correspond to what we think they're supposed to be.
2 replies →
Perhaps this thread is a good place to ask, could anyone contribute their own opinion about the relative future of lean vs. idris/coq/agda? I want to dedicate some time to this from a knowledge representation point of view, but I'm unsure about which of them will have less risk of ending up as so many esoteric languages... I sank a lot of time on clojure core.logic for a related project and got burnt with the low interest / small community issue already, so I've been hesitant to start with any of them for some time.
IME Lean and Coq/Rocq are used more in practice, and have bigger libraries and communities, than Idris and Agda.
Rocq is the most common for program verification, but I suspect mainly due to it being older, and it has weird quirks due to its age, so Lean may catch up. Lean is the most common for proving mathematical theorems.
Big projects verified in Rocq include CompCert, CertiCoq, and sel4. Additionally, some large companies use Rocq to verify critical software like in airplanes (there's a list at https://github.com/ligurio/practical-fm although it may not be accurate). Big projects in Lean include mathlib (collection of various mathematical proofs), and the ongoing work to prove Fermat's Last Theorem (https://imperialcollegelondon.github.io/FLT/) and PFR (https://teorth.github.io/pfr/). I'm not aware of "real-world" projects in Idris and Agda but may be wrong.
That said, they're all small communities compared to something like C++ or JavaScript. Moreover, verifying programs is very slow and tedious (relative to writing them), so I wouldn't be surprised if we see a big breakthrough (perhaps with AI) that fundamentally changes the landscape. But remember that even with a breakthrough your skills may be transferable.
I wouldn't put much money on any of them. IME most mathematicians aren't that interested in formalization, and the gulf between a hand written proof and and a computer verified syntax is pretty huge.
Theyre interesting to learn and play with for their own sake, but I'd be reluctant to make any bets on the future of any of them.
If I had to choose, Lean seems to have the most momentum, though the others have been around longer and each has some loyal users.
Most mathematicians aren't doing formalization themselves, but my impression is that a lot of them are watching with interest. I get asked "is my job secure?" quite a lot nowadays. Answer is "currently yes".
2 replies →
Maybe before we have AGI we should get an AI that can translate Andrew’s proof into Lean for us. Easily tractable, checkable, useful, and build-upon-able
Work is already underway in that direction:
https://github.com/deepseek-ai/DeepSeek-Prover-V2
but also
https://deepmind.google/discover/blog/alphaevolve-a-gemini-p...
Terence Tao is doing a lot of work in this direction.
Kevin Buzzard is reportedly working on formalizing a modern proof of FLT. This effort has already managed to surface some unsound arguments in one of the prereqs for the proof (namely crystalline cohomology) https://xenaproject.wordpress.com/2024/12/11/fermats-last-th... though the issue has since been fixed.
Archive:
https://web.archive.org/web/20250731115617/https://overreact...
Better https://en.wikipedia.org/wiki/Mizar_system Many books create, many proof
I understand that the Mizar community is rather closed and primarily focused on extending the Mizar Mathematical Library. The Mizar proof checker is closed source.
Lean is gaining traction, which can be seen from the fact that at the moment 81 [1] of the 100 theorems of the 'Formalizing 100 Theorems' [1] have been proven in Lean, while Mizar stands at 71 [3]
[1] https://leanprover-community.github.io/100.html
[2] https://www.cs.ru.nl/%7Efreek/100/
[3] https://mizar.uwb.edu.pl/100/
Mizar is actually available under a GPL 3 license (thus FLOSS) from https://github.com/MizarProject/system . There's an experimental Rust reimplementation that's at least 5x faster than the original: https://github.com/digama0/mizar-rs
As an abstract rule of thumb, how much would one have to beef up on logic before attempting to screw around analysis and something like Lean?
None at all. Hit the natural numbers game that was referenced and you can start proving basic things like 2+2=4, n+0=n on up to associativity, commutativity, etc of basic operations, etc.
Appreciate it!
I wonder if there’s any physical phenomena that can be modeled with inconsistent math.
Like we assume everything is consistent but maybe it’s not.
As the article explains (and the title alludes to), if your axioms are inconsistent, so you can prove a contradiction, you can prove anything. Literally any statement that can be formulated can be proven to be true (as can its negation!). So you'd be trying to model the universe with a maths in which 1 = 2. And also 1 = 3. And also 1 =/= 3.
Any useful set of axioms has to be consistent, because any inconsistency is catastrophic. It wrecks the entire thing. Of course that also means, thanks to Godel's theorem, that any useful set of axioms is incomplete, in that there are true statements that cannot be proved from them.
I guess let me rephrase the question. Does there exist system that is inconsistent but the inconsistency is localized and therefore we can actually miss the inconsistency by not knowing or noticing the localized inconsistency? Is there a proof of what you said as well? An inconsistent system with the inconsistency in one spot can have any statement made about the system be both true and false? I’m adjusting the parameters here so we can ask the deeper question of whether one exists in the real world.
2 replies →
Is there a standard library/repository of all existing mathematical proofs one can add to?
There are a few.
The lean one is at https://github.com/leanprover-community/mathlib4
Paging Terrence Howard
[dead]
[dead]
[dead]
I possess a proof of FLT, and will publish the metamath formalization, in due time (first I need to build a secure display, for reasons that will become clear).
Fermat's Last Theorem is easily solved with css. Just do:
and you're sorted.
people jest, but they will jest less once they think a little more critically: how can cryptography experts expound the security of such and such cryptographic primitives involving discrete numbers, exponentiation, ... but be unable to come up with alternative proofs of FLT? ... but be unable to individually formalize existing supposed proof of FLT? ... but be unable to find a more succinct proof of FLT? who knows their way around the block, if the consensus is that Fermat did not possess a proof but Wiles did finally find a proof, who knows their way around the block, if a much more succinct proof than Wiles long-winded-anc-to-this-date-not-formally-verified-proof?
would you jest less?
9 replies →