Comment by danparsonson

11 hours ago

We're not speaking about bugs in a verified system so much as writing articles making specific claims about that. Surely if we're at the level of precision of formal verification, it's incumbent upon us to be precise about the nature of a problem with it, no? "Lean proved this program correct and then I found a bug" heavily implies a flaw in the proof, not a flaw in the runtime (which to my mind would also be a compelling statement, for the reasons you describe).

You're technically right, but what things are versus how they're promoted or understood by most people (rightfully or not) often diverges, and therefore such "grounding" articles are useful, even if the wording addresses the perceived rather than the factual reality.

By way of analogy, if there was an article saying "I bought a 1Tb drive and it only came with 0.91 terabits", I think if you started explaining that technically the problem is the confusion between SI vs binary units and the title should really be "I bought a 0.91 terabit drive and was disappointed it didn't have more capacity", technically you'd be right, but people would rightfully eyeroll at you.

  • To be clear, I think the article was fine and the author did some useful work (finding a bug in the runtime of a supposedly provably correct system is indeed a valuable contribution!). I don't agree that it's pedantic to explain why the title feels like a bait-and-switch (and thus does a disservice to the article itself). It's just a bit of feedback for future reference.

    I take some comfort from being technically correct though; it's widely accepted by all of us pedants that that is the best kind of correct ;-)

Sorry, I'm not sure I follow. We are talking about bugs in a verified system, that is, in this case, a verified implementation of a zlib-based compression tool. Did it have bugs? yes. Several in fact. I'd recommend reading the article for a detailed listing of the bugs in the tool.

  • You give the answers to this in the artcle:

    > The most substantial finding was a heap buffer overflow! but, not in lean-zip's code, but in the Lean runtime itself. (emphasis mine)

    > The OOM denial-of-service is straightforward: the archive parser was never verified. (me again)

    "Lean proved this program correct" vs "I found bugs in the parts that Lean didn't prove was correct". The failure was not in Lean's proof (which as I said is heavily implied by the title), but in ancillary or unverified code.

    Do I misunderstand how Lean works? I am by no means an expert (or even an amateur) on Lean or formal systems in general. Surely the first class of bug could be found in any code that uses the framework, and the second class of bug could happen to any system that isn't proven? Does that make sense? Otherwise where's the boundary? If you find a bug in the OS, does that mean Lean failed somehow? How about the hardware? If your definition of a 'formally verified system' goes beyond the code being verified and the most important thing is whether or not you can make it crash, then the OS and the hardware are also part of the system.

    Of course bugs are important to users regardless of the cause, but the audience for your article seems to be software engineers and as a software engineer, your article was interesting and you found something useful, but the title was misleading; that's all I'm saying.

  • Further to my earlier reply - a more succinct way to look at it might be:

    - When they fix the run time, bug A goes away. So the proof still holds and the zlib code is still correct.

    - When they add a system of proofs for the parser and modify that, then bug B goes away. So the proof still holds and the zlib code is still correct; and now more of the library is proven correct.

    The formulation of the title is "I was told X but that's not true"... but that's not true. You were told X, and X is true, but you found Y and Z which are also important.

    • What is the program?

      There are two different answers to this question, and which one is "correct" depends entirely on the context of who is asking it.

      1. It's the code that is specific to this program that sits above the run-time layer (internal view, that most programmers would take).

      2. It's the code in the binary that is executed (external view, that most users would take).

      The key question does not seem to be "was the proof correct", rather "did the proof cover everything in the program". The answer depends on whether you are looking at it from the perspective of a programmer, or a user. Given the overly strong framing that the article is responding to - highlighting the difference in this way does seem to be useful. The title is correct from the perspective that most users would take.

      3 replies →

  • Say you study some piece of software. And it happens that it has an automated suite of tests. And it happens that some files aren't covered by the test suite. And you happen to find a bug in one of those files that were not covered.

    Would you publish a blog post titled "the XXX test suite proved there was no bug. And then I found one"?

    It would be a bit silly, right?

    • A test suite never proves anything except that the code works for the test cases in it.

      But yes, it would be equivalent to stating "No tests failed but I found a bug" and one would correctly deduce that test coverage is insufficient.

      1 reply →