Comment by 20k

1 day ago

What we really we need is some kind of more detailed spec language that doesn't have edge cases, where we describe exactly what we expect the generated code to do, and then formally verify that the now generated code matches the input spec requirement. It'd be super helpful to have something more formal with no ambiguity, especially because the english language tends to be pretty ambiguous in general which can result in spec problems

I also tend to find especially that there's a lot of cruft in human written spec languages - which makes them overly verbose once you really get into the details of how all of this works, so you could chop a lot of that out with a good spec language

I nominate that we call this completely novel, evolving discipline: 'programming'

There are languages like Dafny that permit you to declare pre- and post-conditions for functions. Dafny in particular tries to automatically verify or disprove these claims with an SMT solver. It would be neat if LLMs could read a human-written contract and iterate on the implementation until it's provably correct. I imagine you'd have much higher confidence in the results using this technique, but I doubt that available models are trained appropriately for this use case.

> What we really we need is some kind of more detailed spec language that doesn't have edge cases, where we describe exactly what we expect the generated code to do, and then formally verify that the now generated code matches the input spec requirement.

That's theorem provers and they're awful for anything of any reasonable complexity.

  • That's any programming language, really [1]. Any website contains millions of "proofs", not all of them are useful. Choosing what needs to be proven is hard. And the spectrum of languages/type systems and their usability as either is more explored nowadays than it used to be. If you don't likue coq, you can look for agda. If agda is too far for you, you can look for Haskell. If that's still impractical, there's rust or f#, etc... The tradeoff you have between "convenient for expressing proofs" and "convenient for programming" has many options.

    [1]: https://www.youtube.com/watch?v=IOiZatlZtGU

Hate it all you want, but XML is genuinely a good fit there, and Claude is apparently insanely good at working with XML prompts.

  • I don’t know why, but I get this feeling whenever someone uses “insanely” or “shockingly” along with AI, I think they’re bot or are writing based on a guideline! No offense, btw, I’m not saying you’re a bot.

    • I'm prepared to excise the word "genuinely" from my vocabulary after working with Claude.

      One of my biggest fears with using AI at work is that I will subconsciously start talking and writing like a bot, despite making conscious efforts to do the opposite. Just like how when you read a lot of books by one author, their style infects your own writing style.

  • Is Claude good at working with XML prompts, or is XML good at convincing users to write more Claude-able specs? I am intensely skeptical that you could write an XML document describing a nontrivial web application in full detail, but I could easily imagine someone who thinks they have to stripping out important details because they don't really map to XML.

yes, am familiar with the "code is spec" trope.

Shame us all for moving away from something so perfect, precise, and that "doesn't have edge cases."

Hey - if you invent a programming language that can be used in such a way and create guaranteed deterministic behavior based on expressed desires as simple as natural language - ill pay a $200/m subscription for it.

  • As people are discovering, natural language is insufficiently precise to be able to specify edge cases. Any language precise enough to be formally verified against is a programming language

    • One agent generates : Spec -> Code then

      Another agent: Code -> Inverted Spec

      then compare Spec and Inverted Spec.

      If there is a Gap, a Human fixes and clarifies the Gap.

      This is like Generator and Discriminator aspects of GAN models or Autoencoder models.

    • we're going to end up speaking past each other - but generally I do agree with you and am not denouncing the importance of formal verification methods. I do think abstractions are going to dominate the human ux above them

> where we describe exactly what we expect the generated code to do, and then formally verify that the now generated code matches the input spec requirement.

In ancient times we had tech to do exactly that: Programming languages and tests.