← Back to context

Comment by latenightcoding

3 days ago

I never hijack non-ai threads to talk about AI, but can anybody share their experience using LLMs to code in Coq, Lean, etc.

I’ve never used them first hand, but crackpots sure do love claiming to solve Riemann hypothesis, P vs NP, Collatz conjecture etc and then peddle out some huge slop. My experience has solely been curiously following what the LLM’s have been generating.

You have to be very, VERY careful. With how predisposed they are to helping, they’ll turn to “dishonesty” rather than just shut down and refuse. What I tend to see is they get backed into a corner, and they’ll do something like prove something different under the guise of another:

They’ll create long pattern matching chains as to create labyrinths of state machines.

They’ll keep naming functions, values and comments to seem plausible, but you have to follow these to make sure they are what they say. A sneaky little trick is to drop important parameters in functions, they appear in the call but not in the actual body.

They’ll do something like taking a Complex value, but only working with the real projection, rounding a number, creatively making negatives not appear by abs etc etc

So even when it compiles, you’ve got the burden of verifying everything is above board which is a pretty huge task.

And when it doesn’t work, introducing an error or two in formal proof systems often means you’re getting exponentially further away from solving your problem.

I’ve not seen a convincing use that tactics or goals in the proof assistant themselves don’t already provide

  • >So even when it compiles, you’ve got the burden of verifying everything is above board which is a pretty huge task.

    Is this true?

    e.g. the Riemann hypothesis is in mathlib:

      def RiemannHypothesis : Prop :=
        ∀ (s : ℂ) (_ : riemannZeta s = 0) (_ : ¬∃ n : ℕ, s = -2 * (n + 1)) (_ : s ≠ 1), s.re = 1 / 2
    

    If I construct a term of this type without going via one of the (fairly obvious) soundness holes or a compiler bug, it's very likely proved, no? No matter how inscrutable the proof is from a mathematical perspective. (Translating it into something mathematicians understand is a separate question, but that's not really what I'm asking.)

  • > introducing an error or two in formal proof systems often means you’re getting exponentially further away from solving your problem

    I wish people understood that this is pretty much true of software building as well.