Comment by ctmnt
12 hours ago
This article’s framing and title are odd. The author, in fact, found no bugs or errors in the proven code. She says so at the end of the article:
> The two bugs that were found both sat outside the boundary of what the proofs cover. The denial-of-service was a missing specification. The heap overflow was a deeper issue in the trusted computing base, the C++ runtime that the entire proof edifice assumes is correct.
Still an interesting and useful result to find a bug in the Lean runtime, but I’d argue that doesn’t justify the title. Or the claim that “the entire proof edifice” is somehow shaky.
It’s important to note that this is the Lean runtime that has a bug, not the Lean kernel, which is the part that actually does the verification (aka proving). [1] So it’s not even immediately clear what this bug would really apply to, since obviously no one’s running any compiled Lean code in any kind of production hot path.
[1] https://lean-lang.org/doc/reference/latest/Elaboration-and-C...
Repeating myself, when we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.
If a buffer overflow causes the system to be exploited and all your bitcoins to be stolen, I don't think the fact that the bug being in the language runtime is going to be much consolation. Especially if the software you were running was advertised as formally verified as free of bugs.
Second, there was a bug in the code. Maybe not a functional correctness bug, but I, along with many and most end users, would consider a crashing program buggy. Maybe we just have different tastes or different standards on what we consider an acceptable level of software quality.
W.r.t people running Lean in production, you'd be surprised...
We're not speaking about bugs in a verified system so much as writing articles making specific claims about that. Surely if we're at the level of precision of formal verification, it's incumbent upon us to be precise about the nature of a problem with it, no? "Lean proved this program correct and then I found a bug" heavily implies a flaw in the proof, not a flaw in the runtime (which to my mind would also be a compelling statement, for the reasons you describe).
You're technically right, but what things are versus how they're promoted or understood by most people (rightfully or not) often diverges, and therefore such "grounding" articles are useful, even if the wording addresses the perceived rather than the factual reality.
By way of analogy, if there was an article saying "I bought a 1Tb drive and it only came with 0.91 terabits", I think if you started explaining that technically the problem is the confusion between SI vs binary units and the title should really be "I bought a 0.91 terabit drive and was disappointed it didn't have more capacity", technically you'd be right, but people would rightfully eyeroll at you.
1 reply →
Sorry, I'm not sure I follow. We are talking about bugs in a verified system, that is, in this case, a verified implementation of a zlib-based compression tool. Did it have bugs? yes. Several in fact. I'd recommend reading the article for a detailed listing of the bugs in the tool.
9 replies →
> when we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.
I agree, but it’s not fair to imply that the verification was incorrect if the problem lies elsewhere.
This is a nice example of how careful you have to be to build a truly verified system.
But is fair to state that the verification was *incomplete*, which is what the article does.
The problem was in Lean though, so it seems fair.
> Repeating myself, when we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.
Yes, and that would be relevant if this was a verified software system. But it wasn't: the system consisted of a verified X and unverified Y, and there were issues in the unverified Y.
The article explicitly acknowledges this: "The two bugs that were found both sat outside the boundary of what the proofs cover."
the good news I guess are
1/ lean-zip is open source so it's much easier to have more Claude's eyes looking at it
2/ I don't think Claude could prove anything substantial about the zip algorithm. That's what lean is for. On the other side, lean could not prove much about what's around the zip algorithm but Claude can be useful there.
So in the end lean-zip is now stronger!
> I don't think the fact that the bug being in the language runtime is going to be much consolation. Especially if the software you were running was advertised as formally verified as free of bugs.
Reminds of what some people in the Rust community do: they fight how safe this is or not. I always challenge that the code is composed of layers, from which unsafe is going to be one. So yes, you are righ to say that. Unsafe means unsafe, safe means safe and we should respect the meaning of those words instead of twisting the meaning for marketing (though I must say I heard this from people in the community, not from the authors themselves, ever).
Totally agreed. For instance that's why sel4 just throws the whole binary at the proof engine. That takes any runtime (minimal in sel4's case) and compiler (not nearly as minimal) out of the TCB.
> if the software you were running was advertised as formally verified as free of bugs.
Nobody should be advertising that. Even ignoring the possibility of bugs in the runtime, there could also be bugs in the verifier and bugs or omissions in the specification. Formally verified never means guaranteed to be free of bugs.
Well then formally verify the language system. I’m not sure what the confusion is. They didn’t say the whole system is formally verified.
Hi Kiran, thanks for following up. FWIW, I enjoy your blog and your work. And I do think it was a valuable bug you found; also nice to see how quickly Henrik fixed it.
Say more about people running Lean in production. I haven’t run into any. I know of examples of people using Lean to help verify other code (Cedar and Aeneas being the most prominent examples), but not the actual runtime being employed.
I took a quick scan of lean-lang.org just now, and, other than the two examples I mentioned, didn’t see a single reference to anything other than proving math.
I’m sure you’re in the Lean Zulup, based on what you’ve been up to. Are you seeing people talk about anything other than math? I’m not, but maybe I’m missing it.
Yes, here's a concrete example: https://github.com/leanprover/SampCert This is an implementation of a verified sampler, in lean. Not an embedding in some other language. The implementation itself is in lean, and a python ffi is used to call into the verified implementation. I don't know if AWS is big enough for your standards, but here is at least one example. Besides that, I'm more reporting on the general vibe I have observed from numerous talks at AI4maths workshops at Neurips, at the DARPA AI4Math ExpMath kickoff, etc. People are considering Lean as a serious programming language. Maybe that's surprising to the mathematicians, but as a PL person, I find the language really nicely designed and I can understand why people want to write in it.
A missing specification in the proof of lean-zip, a lean component, is a real problem to the philosophy and practice of software verification.
To illustrate, let's say you want to verify a "Hello world" program. You'd think a verification involves checking that it outputs "Hello, world!".
However, if a contractor or AI hands you a binary, what do you need to verify? You will need to verify that it does exactly print "Hello, world!", no more, no less. It should write to stdout not stderr. It shouldn't somehow hold a lock on a system resource that it can't clean up. It cannot secretly install a root-kit. It cannot try to read your credentials and send it somewhere. So you will need to specify the proof to a sufficient level of detail to capture those potential deviations.
Broadly, with both bugs, you need to ask a question: does this bug actually invalidate my belief that the program is "good"? And here you are pulling up a fact that the bug isn't found in the Lean kernel, which makes an assumption that there's no side-effect that bleeds over the abstraction boundary between the runtime and the kernel that affects the correctness of the proof; that safety guarantee is probably true 99.99% of the time - but if the bug causes a memory corruption, you'd be much less confident in that guarantee.
If you're really serious about verifying an unknown program, you will really think hard "what is missing from my spec"? And the answer will depend on things that are fuzzier than the Lean proof.
Now, pragmatically, there many ways a proof of correctness adds a lot of value. If you have the source code of a program, and you control the compiler, you can check the source code doesn't have weird imports ("why do I need kernel networking headers in this dumb calculator program?"), so the scope of the proof will be smaller, and you can write a small specification to prove it and the proof will be pretty convincing.
All in all, this is a toy problem that tells you : you can't verify what you don't know you should verify, and what you need to verify depends on the prior distribution of what the program is that you need to verify, so that conditional on the proof you have, the probability of correctness is sufficiently close to 1. There's a lesson to learn here, even if we deem Lean is still a good thing to use.
> A missing specification in the proof of lean-zip, a lean component, is a real problem to the philosophy and practice of software verification.
Every time someone makes this point, I feel obliged to point out that all alternatives to software verification have this exact same problem, AND many, many more.
Yeah, the title made me think the author found a bug in the Lean kernel, thus making an invalid proof pass Lean's checks. The article instead uncovers bugs in the Lean runtime and lean-zip, but these bugs are less damning than e.g. the kernel, which must be trusted to be correct, or else you can't trust any proof in Lean.
When the Lean runtime has bugs, all Lean applications using the Lean runtime also have those bugs. I can’t understand people trying to make a distinction here. Is your intent to have a bug free application or to just show the Lean proof kernel is solid?? The latter is only useful to Lean developers, end users should only care about the former!
There are no Lean applications other than Lean. This is an important point most of the comments are missing. Lean is for proving math. Yes, you can use it for other things; but no, no one is.
Still good to have found, but drawing conclusions past “someone could cheat at proving the continuum hypothesis” isn’t really warranted.
The intent is to have a proof of some proposition. The Lean runtime crashing doesn't stop the lean-zip developers from formally modelling zlib and proving certain correctness statements under this model. On the other hand, the Lean kernel having a bug would mean we may discover entire classes of proofs that were just wrong; if those statements were used as corollaries/lemmas/etc. for other proofs, then we'd be in a lot of trouble.
When I see a title transitioning from "Lean said this proof is okay" to "I found a bug in Lean", I'm intuitively going to think the author just found a soundness (or consistency) issue in Lean.
I found the list of articles on this site amusing:
https://kirancodes.me/posts/log-who-watches-the-watchers.htm...
You can see the clickbaitiness increases over time.
Looks like a normal distribution about the chaos mean to me. I appreciate its not everyones cup of tea, but I like this style of writing.
> obviously no one’s running any compiled Lean code in any kind of production hot path
Ignorant question: why not? Is there an unacceptable performance penalty? And what's the recommended way in that case to make use of proven Lean code in production that keeps the same guarantees?
Yes, it isn’t performant. Lean isn’t a language for writing software, though you technically can; it’s a language for proving math.
> It’s important to note that this is the Lean runtime that has a bug, not the Lean kernel, which is the part that actually does the verification (aka proving). [1] So it’s not even immediately clear what this bug would really apply to
Well, Lean is written in Lean, so I am pretty sure a runtime bug like this could be exploited to prove `False`. Yes, since the kernel is written in C++, technically it's not the part affected by this bug, but since you cannot use the Lean kernel without the Lean compiler, this difference does not matter.
I think this must be wrong. If the kernel is free of bugs then it's not going to pass a proof of false no matter what the Lean compiler gets up to.
Sure, but are you worried about someone cheating on their arXiv submission by exploiting a buffer overflow? It’s a real bug, it’s just not very important.
To me, saying that there is a bug in the lean runtime means lean-zip has a bug is like saying a bug in JRE means a Java app that uses the runtime has a bug, even though the Java app code itself does have a bug. It seems like the author is being intentionally misleading about his findings.
No. It would be like finding a memory unsafe caused bug in a Java application that is due to a bug in the JRE. That would absolutely warrant a title like “I found memory unsafe bug in my Java code” when everyone expects Java code to be memory safe, which is analogous to the article in question.
I think it's ambiguous and fair game for the idea of answering the question "if we write programs in this manner, will there be exploitable bugs?
“‘Proven’ code turns out buggy, because of a bug outside the proven specification” is relatively common. It happened in CompCert: https://lobste.rs/s/qlrh1u/runtime_error_formally_verified
> This article’s framing and title are odd.
It's called clickbait.