Comment by janalsncm

1 year ago

I think a lot of people in the ML community were excited for Noam Brown to lead the O series at OpenAI because intuitively, a lot of reasoning problems are highly nonlinear i.e. they have a tree-like structure. So some kind of MCTS would work well. O1/O3 don’t seem to use this, and DeepSeek explicitly mentioned difficulties training such a model.

However, I think this is coming. DeepSeek mentioned it was hard to learn a value model for MCTS from scratch, but this doesn’t mean we couldn’t seed it with some annotated data.

2 comments

janalsncm

insane-c0der 1 year ago

Do you have a reference for us to check? - "DeepSeek explicitly mentioned difficulties training such a model."

janalsncm 1 year ago

Section 4.2: Unsuccessful attempts
https://arxiv.org/pdf/2501.12948