Comment by zaxioms

1 day ago

It's cool, but I genuinely cannot fathom why they are targeting natural language proofs instead of a proof assistant.

7 comments

zaxioms

Well they do that too: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B

But I suppose the bigger goal remains improving their language model, and this was an experimentation born from that. These works are symbiotic; the original DeepSeekMath resulted in GRPO, which eventually formed the backbone of their R1 model: https://arxiv.org/abs/2402.03300

gjm11 20 hours ago

The "obvious" thing to try, which presumably some people are trying pretty hard right now[1], is to (1) use a mathematically-tuned LLM like this one to propose informal Next Things To Try, (2) use an LLM (possibly the same LLM) to convert those into proof assistant formalism, (3) use the proof assistant to check whether what the LLM has suggested is valid, and (4) hook the whole thing together to make a proof-finding-and-verifying machine that never falsely claims to have proved something (because everything goes through that proof assistant) and therefore can tolerate confabulations from LLM #1 and errors from LLM #2 because all those do is waste some work.

[1] IIRC, AlphaProof is a bit like this. But I bet that either there's a whole lot of effort on this sort of thing in the major AI labs, or else there's some good reason to expect it not to work that I haven't thought of. (Maybe just the "bitter lesson", I guess.)

It would doubtless be challenging to get such a system to find large difficult proofs, because it's not so easy to tell what's making progress and what isn't. Maybe you need LLM #3, which again might or might not be the same as the other two LLMs, to assess what parts of the attempt so far seem like they're useful, and scrub the rest from the context or at least stash it somewhere less visible.

It is, of course, also challenging for human mathematicians to find large difficult proofs, and one of the reasons for them is that it's not so easy to tell what's making progress and what isn't. Another major reason, though, is that sometimes you need a genuinely new idea, and so far LLMs aren't particularly good at coming up with those. But a lot of new-enough-ideas[2] are things like "try a version of this technique that worked well in an apparently unrelated field", which is the kind of thing LLMs aren't so bad at.

[2] Also a lot of the new-enough-ideas that mathematicians get really happy about. One of the cool things about mathematics is the way that superficially-unrelated things can turn out to share some of their structure. If LLMs get good at finding that sort of thing but never manage any deeper creativity than that, it could still be enough to produce things that human mathematicians find beautiful.

mamami 1 day ago

Natural language is a lot more, well, readable than say lean. You get a lot less intuition and understanding of what the model is attempting to do in the first place.

blazespin 1 day ago

More training data on advanced math. Lean is cool, but it's mostly about formalizing stuff we already know.

zaxioms 21 hours ago
Ok I guess I could have told you that. What I really meant is that in the future where LLMs are doing new math (which I'm skeptical of, but I digress) I would not trust any of it unless it was formally verified.
- blazespin 3 hours ago
  
  if you read the paper that is the intention, to guide stuff like lean.
  i don't think llm is a great pure rlvr

Davidzheng 20 hours ago

I think there's a lot of baggage doing it in lean. like what the libraries are at currently. how things are implemented. which things are not implemented, etc. but it still remains to be seen what wins (my money would be on informal)