← Back to context

Comment by gizmo686

8 days ago

A limited form of formal verification is already mainstream. It is called type systems. The industry in general has been slowly moving to encode more invariants into the type system, because every invariant that is in the type system is something you can stop thinking about until the type checker yells at you.

A lot of libraries document invariants that are either not checked at all, only at runtime, or somewhere in between. For instance, the requirement that a collection not be modified during interaction. Or that two region of memory do not overlap, or that a variable is not modified without owning a lock. These are all things that, in principle, can be formally verified.

No one claims that good type systems prevent buggy software. But, they do seem to improve programmer productivity.

For LLMs, there is an added benefit. If you can formally specify what you want, you can make that specification your entire program. Then have an LLM driven compiler produce a provably correct implementation. This is a novel programming paradigm that has never before been possible; although every "declarative" language is an attempt to approximate it.

> No one claims that good type systems prevent buggy software.

That's exactly what languages with advanced type systems claim. To be more precise, they claim to eliminate entire classes of bugs. So they reduce bugs, they don't eliminate them completely.

  • No nulls, no nullability bombs.

    Forcing devs to pre-fix/avoid bugs before the compiler will allow the app means the programs are more correct as a group.

    Wrong, incomplete, insufficient, unhelpful, unimpressive, and dumb are all still very possible. But more correct than likely in looser systems.

    • > No nulls, no nullability bombs.

      I hate this meme. Null indicates something. If you disallow null that same state gets encoded in some other way. And if you don't properly check for that state you get the exact same class of bug. The desirable type system feature here is the ability to statically verify that such a check has occurred every time a variable is accessed.

      Another example is bounds checking. Languages that stash the array length somewhere and verify against it on access eliminate yet another class of bug without introducing any programmer overhead (although there generally is some runtime overhead).

      1 reply →

> For LLMs, there is an added benefit. If you can formally specify what you want, you can make that specification your entire program. Then have an LLM driven compiler produce a provably correct implementation. This is a novel programming paradigm that has never before been possible; although every "declarative" language is an attempt to approximate it.

The problem is there is always some chance a coding agent will get stuck and be unable to produce a conforming implementation in a reasonable amount of time. And then you are back in a similar place to what you were with those pre-LLM solutions - needing a human expert to work out how to make further progress.

  • With the added issue that now the expert is working with code they didn't write, and that could be in general be harder to understand than human-written code. So they could find it easier to just throw it away and start from scratch.

Some type systems (e.g, Haskell) are closing in in becoming formal verification languages themselves.

  • Piggybacking off your comment, I just completed a detailed research paper where I compared Haskell to C# with an automated trading strategy. I have many years of OOP and automated trading experience, but struggled a bit at first implementing in Haskell syntax. I attempted to stay away from LLMs, but ended up using them here and there to get the syntax right.

    Haskell is actually a pretty fun language, although it doesn't fly off my fingers like C# or C++ does. I think a really great example of the differences is displayed in the recursive Fibonacci sequence.

    In C#:

        public int Fib(int n)
        {
            if (n <= 1)
                return n;
            else
                return Fib(n - 1) + Fib(n - 2);
        }
    

    In Haskell:

        fib :: Integer -> Integer
        fib n
          | n <= 1    = n
          | otherwise = fib (n - 1) + fib (n - 2)
    

    As you might know, this isn't even scratching the surface of the Haskell language, but it does a good job highlighting the syntax differences.

    • When using switch expression in C#, they are a lot more similar:

          public int Fib(int n) => n switch
          {
              <= 1 => n,
              _    => Fib(n - 1) + Fib(n - 2)
          };

  • And one can see how quickly they became mainstream...

    • Given that it's the AI doing the coding, it would be pretty quickly so long as it's decent at Haskell. Which it already is, surprisingly so actually for such a niche language. It doesn't necessarily write great code, but it's good enough, and the straightjacket type system makes it very hard for the model to sneak in creative hacks like using globals, or trip itself with mutable state.

> No one claims that good type systems prevent buggy software. But, they do seem to improve programmer productivity.

To me it seems they reduce productivity. In fact, for Rust, which seems to match the examples you gave about locks or regions of memory the common wisdom is that it takes longer to start a project, but one reaps the benefits later thanks to more confidence when refactoring or adding code.

However, even that weaker claim hasn’t been proven.

In my experience, the more information is encoded in the type system, the more effort is required to change code. My initial enthusiasm for the idea of Ada and Spark evaporated when I saw how much ceremony the code required.

  • > In my experience, the more information is encoded in the type system, the more effort is required to change code.

    I would tend to disagree. All that information encoded in the type system makes explicit what is needed in any case and is otherwise only carried informally in peoples' heads by convention. Maybe in some poorly updated doc or code comment where nobody finds it. Making it explicit and compiler-enforced is a good thing. It might feel like a burden at first, but you're otherwise just closing your eyes and ignoring what can end up important. Changed assumptions are immediately visible. Formal verification just pushes the boundary of that.

    • In practice it would be encoded in comments, automated tests and docs, with varying levels of success.

      It’s actually similar to tests in a way: they provide additional confidence in the code, but at the same time ossify it and make some changes potentially more difficult. Interestingly, they also make some changes easier, as long as not too many types/tests have to be adapted.

      3 replies →

    • > All that information encoded in the type system makes explicit what is needed in any case and is otherwise only carried informally in peoples' heads by convention

      this is, in fact better for llms, they are better at carrying information and convention in their kv cache than they are in having to figure out the actual types by jumping between files and burning tokens in context/risking losing it on compaction (or getting it wrong and having to do a compilation cycle).

      if a typed language lets a developer fearlessly build a semantically inconsistent or confusing private API, then llms will perform poorer at them even though correctness is more guaranteed.

    • It is definitely harder to refactor Haskell than it is Typescript. Both are "safe" but one is slightly safer, and much harder to work with.

  • Capturing invariants in the type system is a two-edged sword.

    At one end of the spectrum, the weakest type systems limit the ability of an IDE to do basic maintenance tasks (e.g. refactoring).

    At the other end of the spectrum, dependent type and especially sigma types capture arbitrary properties that can be expressed in the logic. But then constructing values in such types requires providing proofs of these properties, and the code and proofs are inextricably mixed in an unmaintainable mess. This does not scale well: you cannot easily add a new proof on top of existing self-sufficient code without temporarily breaking it.

    Like other engineering domains, proof engineering has tradeoffs that require expertise to navigate.

  • > but one reaps the benefits later thanks to more confidence when refactoring or adding code.

    To be honest, I believe it makes refactoring/maintenance take longer. Sure, safer, but this is not a one-time only price.

    E.g. you decide to optimize this part of the code and only return a reference or change the lifetime - this is an API-breaking change and you have to potentially recursively fix it. Meanwhile GC languages can mostly get away with a local-only change.

    Don't get me wrong, in many cases this is more than worthwhile, but I would probably not choose rust for the n+1th backend crud app for this and similar reasons.

    • The choice of whether to use GC is completely orthogonal to that of a type system. On the contrary, being pointed at all the places that need to be recursively fixed during a refactoring is a huge saving in time and effort.

      1 reply →

  • In my opinion, programming languages with a loose type system or no explicit type system only appear to foster productivity, because it is way easier to end up with undetected mistakes that can bite later, sometimes much later. Maybe some people argue that then it is someone else's problem, but even in that case we can agree that the overall quality suffers.

  • "In my experience, the more information is encoded in the type system, the more effort is required to change code."

    Have you seen large js codebases? Good luck changing anything in it, unless they are really, really well written, which is very rare. (My own js code is often a mess)

    When you can change types on the fly somewhere hidden in code ... then this leads to the opposite of clarity for me. And so lots of effort required to change something in a proper way, that does not lead to more mess.

    • There’s two types of slowdown at play:

      a) It’s fast to change the code, but now I have failures in some apparently unrelated part of the code base. (Javascript) and fixing that slows me down.

      b) It’s slow to change the code because I have to re-encode all the relationships and semantic content in the type system (Rust), but once that’s done it will likely function as expected.

      Depending on project, one or the other is preferable.

      2 replies →

  • Soon a lot of people will go out of the way and try to convince you that Rust is most productive language, functions having longer signatures than their bodies is actually a virtue, and putting .clone(), Rc<> or Arc<> everywhere to avoid borrow-checker complaints makes Rust easier and faster to write than languages that doesn't force you to do so.

    Of course it is a hyperbole, but sadly not that large.

> For LLMs, there is an added benefit. If you can formally specify what you want, you can make that specification your entire program. Then have an LLM driven compiler produce a provably correct implementation. This is a novel programming paradigm that has never before been possible; although every "declarative" language is an attempt to approximate it.

That is not novel and every declarative language precisely embodies it.

  • I think most existing declarative languages still require the programmer to specify too many details to get something usable. For instance, Prolog often requires the use of 'cut' to get reasonable performance for some problems.

> No one claims that good type systems prevent buggy software. But, they do seem to improve programmer productivity.

They really don’t. How did you arrive at such a conclusion?

  • Not that I can answer for OP but as a personal anecdote; I've never been more productive than writing in Rust, it's a goddamn delight. Every codebase feels like it would've been my own and you can get to speed from 0 to 100 in no time.

    • Yeah, I’ve been working mainly in rust for the last few years. The compile time checks are so effective that run time bugs are rare. Like you can refactor half the codebase and not run the app for a week, and when you do it just works. I’ve never had that experience in other languages.

  • Through empirical evidence? Do you think that the vast majority of software devs moved to typing for no reason?

    • > Do you think that the vast majority of software devs moved to typing for no reason?

      It is quite clear that this industry is mostly driven by hype and fades, not by empirical studies.

      Empirical evidence in favor of a claim that static typing and complex type systems reduce bugs or improve productivity is highly inconclusive at best

    • It's a bad reason. A lot of best practices are temporary blindnesses, comparable, in some sense, with supposed love to BASIC before or despite Dijkstra. So, yes, it's possible there is no good reason. Though I don't think it's the case here.