← Back to context

Comment by vatsachak

3 hours ago

Yeah, but I wouldn't be surprised if they train the model on verification assisted by Lean.

Arguing similarly to how stockfish, the chess engine, trains I would not be surprised if this is more common in the future. I don't know if they use any proof verification tools during their reinforcement learning procedure right now, as far as I know they've been focusing more on COT based strategies (w/o Lean). But I'm hardly an LLM expert, I don't know.