← Back to context

Comment by futureshock

4 days ago

Google’s AlphaProof, which got a silver last year, has been using a neural symbolic approach. This gold from OpenAI was pure LLM. We’ll have to see what Google announces, but the LLM approach is interesting because it will likely generalize to all kinds of reasoning problems, not just mathematical proofs.

OpenAI’s systems haven’t been pure language models since the o models though, right? Their RL approach may very well still generalize, but it’s not just a big pre-trained model that is one-shotting these problems.

The key difference is that they claim to have not used any verifiers.

  • What do you mean by “pure language model”? The reasoning step is still just the LLM spitting out tokens and this was confirmed by Deepseek replicating the o models. There’s not also a proof verifier or something similar running alongside it according to the openai researchers.

    If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.

    • Local models you can get just the pretrained versions of, no RLHF. IIRC both Llama and Gemma make them available.

> it will likely generalize to all kinds of reasoning problems, not just mathematical proofs

Big if true. Setting up an RL loop for training on math problems seems significantly easier than many other reasoning domains. Much easier to verify correctness of a proof than to verify correctness (what would this even mean?) for a short story.

I’m much more excited about the formalized approach, as LLM’s are susceptible to making things up. With formalization, we can be mathematically certain that a proof is correct. This could plausibly lead to machines surpassing humans in all areas of math. With a “pure English” approach, you still need a human to verify correctness.

Neither Gemini or OpenAI have open models. We don’t know for sure what’s happening underneath.