Comment by nowittyusername

6 months ago

I wonder suppose you are not relying on any tricks or funky shenanigans. is it possible to just throw random stuff at Lean and find interesting observations based if it approves? like use an automated system or an llm that tries all types of wild proofs/theories? and sees if it works? maybe im asking wrong questions though or not stating it well.. this is above my understanding, i could barely get my head around prolog.

8 comments

nowittyusername

Paracompact 6 months ago

As someone who does certified programming professionally, I believe generative AI and formal methods are a match made in heaven. I might even go so far as to wager that the notion that human programmers will or will not be replaced by LLM-based AI is entirely dependent on whether these AI can get good at certified programming + compositional reasoning.

> is it possible to just throw random stuff at Lean and find interesting observations based if it approves?

Traditional AI has an easy time with checkers, because the search space is small. Chess is a little harder. Go still cannot be reasonably tackled by non-machine learning AI. Meanwhile, the search space (number of available moves together with the diversity of explorable states) for a sophisticated formal language becomes unfathomably large.

When the problem is known to be of a certain nature, often you can time-efficiently brute force it via SMT solvers. Traditionally SMT solvers and proof assistants have been separate branches of formal methods, but they're finally learning to play off each other's strengths (cf. Sledgehammer, Lean-SMT).

> like use an automated system or an llm that tries all types of wild proofs/theories? and sees if it works?

Research in this vein needs to be made more mainstream. I'm excited, though, that there have been big funders behind these ideas for years now, even before LLMs became big. Cf. "Learning to Find Proofs and Theorems by Learning to Refine Search Strategies" for earlier work, or DeepSeek-Prover for more recent attempts. I'm no machine learning expert, but it seems it's still a very open question how to best train these things and what their future potential is.

All in all mainstream LLMs are still rather mediocre at languages like Rocq and Lean. And when they're wrong, their proof scripts are extremely tedious to try to troubleshoot and correct. But I have hope AI tooling in formal methods will mature greatly over time.

benreesman 6 months ago
I'm already doing almost all my LLM-assisted programming in Haskell, all the devops in check-heavy Nix, and starting to move everything into Dhall.
There's no way that Python written by LLMs will survive contact with Haskell written by LLMs in a standup competition.
One imagines the scope for this sort of thing goes very high in formality.
- yncytt 6 months ago
  
  > There's no way that Python written by LLMs will survive contact with Haskell written by LLMs in a standup competition.
  That's likely true for formal correctness.
  But what about the disparity between python's large and mature ecosystem compared the skimpiness of Haskell's libraries? How is an LLM going to compensate for Haskell's lack of math, AI/ML, graphics, web, and other key infrastructures?
  An LLM writing Haskell must produce new code where production-grade libraries don't exist, and that Haskell app will be bigger than python's. That Haskell app will be relatively unverified compared to an LLM-written python app, simply because all that missing library code is brand new and hasn't proven itself the real world.
  What about the disparity in LLM training data across many application domains? Given little Haskell training data, an LLM will be more likely to hallucinate and will require more human iterations before producing a production-ready app.
  > One imagines the scope for this sort of thing goes very high in formality.
  Haskell's type system is great, but only as far as it goes.
  
  1 reply →
- Paracompact 6 months ago
  
  Yeah, it makes me sad that the skill of LLMs at a language is directly proportional to the popularity of a language, which is itself inversely proportional to the formal guardrails placed on the language by a type system.
- fragmede 6 months ago
  
  Serious question: Why not have it generate assembly?
  
  1 reply →

danabramov 6 months ago

This is an active area of research and experimentation!

Much of Lean community are on Zulip (which is kind of like a forum?) and you can see many relevant threads here: https://leanprover.zulipchat.com/#narrow/channel/219941-Mach...