← Back to context

Comment by QuadrupleA

8 days ago

I don't think formal verification really addresses most day-to-day programming problems:

    * A user interface is confusing, or the English around it is unclear
    * An API you rely on changes, is deprecated, etc.
    * Users use something in unexpected ways
    * Updates forced by vendors or open source projects cause things to break
    * The customer isn't clear what they want
    * Complex behavior between interconnected systems, out of the purview of the formal language (OS + database + network + developer + VM + browser + user + web server)

For some mathematically pure task, sure, it's great. Or a low-level library like a regular expression parser or a compression codec. But I don't think that represents a lot of what most of us are tasked with, and those low-level "mathematically pure" libraries are generally pretty well handled by now.

In fact, automated regression tests done by ai with visual capabilities may have bigger impact than formal verification has. You can have an army of testers now, painfully going through every corner of your software

  • In practice ends up being a bit like static analysis though, which is you get a ton of false positives.

    All said, I’m now running all commits through Codex (which is the only thing it’s any good at), and it’s really pretty good at code reviews.

  • Will only work somewhat when customers expect features to work in a standard way. When customer spec things to work in non-standard approaches you'll just end up with a bunch of false positives.

    • This. When the bugs come streaming in you better have some other AI ready to triage them and more AI to work them, because no human will be able to keep up with it all.

      Bug reporting is already about signal vs noise. Imagine how it will be when we hand the megaphone to bots.

TBH most day to day programming problems are barely worth having tests for. But if we had formal specs and even just hand wavy correspondences between the specs and the implementation for the low level things everybody depends on that would be a huge improvement for the reliability of the whole ecosystem.

A limited form of formal verification is already mainstream. It is called type systems. The industry in general has been slowly moving to encode more invariants into the type system, because every invariant that is in the type system is something you can stop thinking about until the type checker yells at you.

A lot of libraries document invariants that are either not checked at all, only at runtime, or somewhere in between. For instance, the requirement that a collection not be modified during interaction. Or that two region of memory do not overlap, or that a variable is not modified without owning a lock. These are all things that, in principle, can be formally verified.

No one claims that good type systems prevent buggy software. But, they do seem to improve programmer productivity.

For LLMs, there is an added benefit. If you can formally specify what you want, you can make that specification your entire program. Then have an LLM driven compiler produce a provably correct implementation. This is a novel programming paradigm that has never before been possible; although every "declarative" language is an attempt to approximate it.

  • > No one claims that good type systems prevent buggy software.

    That's exactly what languages with advanced type systems claim. To be more precise, they claim to eliminate entire classes of bugs. So they reduce bugs, they don't eliminate them completely.

    • No nulls, no nullability bombs.

      Forcing devs to pre-fix/avoid bugs before the compiler will allow the app means the programs are more correct as a group.

      Wrong, incomplete, insufficient, unhelpful, unimpressive, and dumb are all still very possible. But more correct than likely in looser systems.

      2 replies →

    • If you eliminate the odd integers from consideration, you've eliminated an entire class of integers. yet, the set of remaining integers is of the same size as the original.

      6 replies →

  • > For LLMs, there is an added benefit. If you can formally specify what you want, you can make that specification your entire program. Then have an LLM driven compiler produce a provably correct implementation. This is a novel programming paradigm that has never before been possible; although every "declarative" language is an attempt to approximate it.

    The problem is there is always some chance a coding agent will get stuck and be unable to produce a conforming implementation in a reasonable amount of time. And then you are back in a similar place to what you were with those pre-LLM solutions - needing a human expert to work out how to make further progress.

    • With the added issue that now the expert is working with code they didn't write, and that could be in general be harder to understand than human-written code. So they could find it easier to just throw it away and start from scratch.

  • Some type systems (e.g, Haskell) are closing in in becoming formal verification languages themselves.

    • Piggybacking off your comment, I just completed a detailed research paper where I compared Haskell to C# with an automated trading strategy. I have many years of OOP and automated trading experience, but struggled a bit at first implementing in Haskell syntax. I attempted to stay away from LLMs, but ended up using them here and there to get the syntax right.

      Haskell is actually a pretty fun language, although it doesn't fly off my fingers like C# or C++ does. I think a really great example of the differences is displayed in the recursive Fibonacci sequence.

      In C#:

          public int Fib(int n)
          {
              if (n <= 1)
                  return n;
              else
                  return Fib(n - 1) + Fib(n - 2);
          }
      

      In Haskell:

          fib :: Integer -> Integer
          fib n
            | n <= 1    = n
            | otherwise = fib (n - 1) + fib (n - 2)
      

      As you might know, this isn't even scratching the surface of the Haskell language, but it does a good job highlighting the syntax differences.

      1 reply →

  • > No one claims that good type systems prevent buggy software. But, they do seem to improve programmer productivity.

    To me it seems they reduce productivity. In fact, for Rust, which seems to match the examples you gave about locks or regions of memory the common wisdom is that it takes longer to start a project, but one reaps the benefits later thanks to more confidence when refactoring or adding code.

    However, even that weaker claim hasn’t been proven.

    In my experience, the more information is encoded in the type system, the more effort is required to change code. My initial enthusiasm for the idea of Ada and Spark evaporated when I saw how much ceremony the code required.

    • > In my experience, the more information is encoded in the type system, the more effort is required to change code.

      I would tend to disagree. All that information encoded in the type system makes explicit what is needed in any case and is otherwise only carried informally in peoples' heads by convention. Maybe in some poorly updated doc or code comment where nobody finds it. Making it explicit and compiler-enforced is a good thing. It might feel like a burden at first, but you're otherwise just closing your eyes and ignoring what can end up important. Changed assumptions are immediately visible. Formal verification just pushes the boundary of that.

      6 replies →

    • Capturing invariants in the type system is a two-edged sword.

      At one end of the spectrum, the weakest type systems limit the ability of an IDE to do basic maintenance tasks (e.g. refactoring).

      At the other end of the spectrum, dependent type and especially sigma types capture arbitrary properties that can be expressed in the logic. But then constructing values in such types requires providing proofs of these properties, and the code and proofs are inextricably mixed in an unmaintainable mess. This does not scale well: you cannot easily add a new proof on top of existing self-sufficient code without temporarily breaking it.

      Like other engineering domains, proof engineering has tradeoffs that require expertise to navigate.

    • > but one reaps the benefits later thanks to more confidence when refactoring or adding code.

      To be honest, I believe it makes refactoring/maintenance take longer. Sure, safer, but this is not a one-time only price.

      E.g. you decide to optimize this part of the code and only return a reference or change the lifetime - this is an API-breaking change and you have to potentially recursively fix it. Meanwhile GC languages can mostly get away with a local-only change.

      Don't get me wrong, in many cases this is more than worthwhile, but I would probably not choose rust for the n+1th backend crud app for this and similar reasons.

      2 replies →

    • In my opinion, programming languages with a loose type system or no explicit type system only appear to foster productivity, because it is way easier to end up with undetected mistakes that can bite later, sometimes much later. Maybe some people argue that then it is someone else's problem, but even in that case we can agree that the overall quality suffers.

    • "In my experience, the more information is encoded in the type system, the more effort is required to change code."

      Have you seen large js codebases? Good luck changing anything in it, unless they are really, really well written, which is very rare. (My own js code is often a mess)

      When you can change types on the fly somewhere hidden in code ... then this leads to the opposite of clarity for me. And so lots of effort required to change something in a proper way, that does not lead to more mess.

      3 replies →

    • Soon a lot of people will go out of the way and try to convince you that Rust is most productive language, functions having longer signatures than their bodies is actually a virtue, and putting .clone(), Rc<> or Arc<> everywhere to avoid borrow-checker complaints makes Rust easier and faster to write than languages that doesn't force you to do so.

      Of course it is a hyperbole, but sadly not that large.

  • > For LLMs, there is an added benefit. If you can formally specify what you want, you can make that specification your entire program. Then have an LLM driven compiler produce a provably correct implementation. This is a novel programming paradigm that has never before been possible; although every "declarative" language is an attempt to approximate it.

    That is not novel and every declarative language precisely embodies it.

    • I think most existing declarative languages still require the programmer to specify too many details to get something usable. For instance, Prolog often requires the use of 'cut' to get reasonable performance for some problems.

  • > No one claims that good type systems prevent buggy software. But, they do seem to improve programmer productivity.

    They really don’t. How did you arrive at such a conclusion?

    • Not that I can answer for OP but as a personal anecdote; I've never been more productive than writing in Rust, it's a goddamn delight. Every codebase feels like it would've been my own and you can get to speed from 0 to 100 in no time.

      1 reply →

> Complex behavior between interconnected systems, out of the purview of the formal language (OS + database + network + developer + VM + browser + user + web server)

Isn't this what TLA+ was meant to deal with?

  • Not really, some components like components have a lot of properties that’s very difficult to modelize. Take latency in network, or storage performance in OS.

Actually, formal verification could help massively with four of those problems — all but the first (UI/UX) and fifth (requirements will always be hard).

A change in the API of a dependency should be detected immediately and handled silently.

Reliance on unspecified behavior shouldn't happen in the first place; the client's verification would fail.

Detecting breakage caused by library changes should be where verification really shines; when you get the update, you try to re-run your verification, and if that fails, it tells you what the problem is.

As for interconnected systems, again, that's pretty much the whole point. Obviously, achieving this dream will require formalizing pretty much everything, which is well beyond our capabilities now. But eventually, with advances in AI, I think it will be possible. It will take something fundamentally better than today's LLMs, though.

That has been the problem with unit and integration tests all the time. Especially for systems that tend to be distributed.

AI makes creating mock objects much easier in some cases, but it still creates a lot of busy work and makes configuration more difficult. At at this points it often is difficult configuration management that cause the issues in the first place. Putting everything in some container doesn't help either, on the contrary.

> But I don't think that represents a lot of what most of us are tasked with

Give me a list of all the libraries you work with that don't have some sort of "okay but not that bit" rule in the business logic, or "all of those function are f(src, dst) but the one you use most is f(dst,src) and we can't change it now".

I bet it's a very short list.

Really we need to scrap every piece of software ever written and start again from scratch with all these weirdities written down so we don't do it again, but we never will.

  • Scrapping everything wouldn't help. 15 years ago the project I'm on did that - for a billion dollars. We fixed the old mistakes but made plenty of new ones along the way. We are trying to fix those now and I can't help but wonder what new mistakes we are making the in 15 years we will regret.

    • Computers are terrible and software is terrible and we should just go back to tilling the fields with horses and drinking beer.

Yeah, there were about 5 or 10 videos about this "complexity" and unpredictability of 3rd parties and wheels involved that AI doesn't control and even forget - small context window - in like past few weeks. I am sure you have seen at least one of them ;)

But it's true. AI is still super narrow and dumb. Don't understand basic prompts even.

Look at the computer games now - they still don't look real despite almost 30 years since Half-life 1 started the revolution - I would claim. Damn, I think I ran it on 166 Mhz computer on some lowest details even.

Yes, it's just better and better but still looking super uncanny - at least to me. And it's been basically 30 years of constant improvements. Heck, Roomba is going bankrupt.

I am not saying things don't improve but the hype and AI bubble is insane and the reality doesn't match the expectation and predictions at all.

> An API you rely on changes, is deprecated, etc

Formal verification will eventually lead to good, stable API design.

> Users use something in unexpected ways

> Complex behavior between interconnected systems

It happens when there's no formal verification during the design stage.

Formal verification literally means cover 100% state changes and for every possible input/output, every execution branch should be tested.

  • Formal verification has nothing to do with the quality of the API.

    Given the spec, formal verification can tell you if your implementation follows the spec. It cannot tell you if the spec if good

    • That is one problem of many solved, isn't that good?

      That the spec solves the problem is called validation in my domain and treated explicitly with different methods.

      We use formal validation to check for invariants, but also "it must return a value xor an error, but never just hang".

  • > Formal verification will eventually lead to good, stable API design.

    Why? Has it ever happened like this? Because to me it would seem that if the system verified to work, then it works no matter how API is shaped, so there is no incentive to change it to something better.

    • > if the system verified to work, then it works no matter how API is shaped

      That's the case for one-off integrations, but the messy part always comes when system goal changes

      Let's say formal verification could help to avoid some anti-patterns.

      5 replies →

  • 100% of state changes in business software is unknowable on a long horizon, and relies on thoroughly understanding business logic that is often fuzzy, not discrete and certain.

    • Formal verification does not gurantee business logic works as everybody expected, nor its future proof, however, it does provide a workable path towards:

      Things can only happen if only you allow it to happen.

      It other words, your software may come to a stage where it's no longer applicable, but it never crashes.

      Formal verification had little adoption only because it costs 23x of your original code with "PhD-level training"

      1 reply →