← Back to context

Comment by janalsncm

2 months ago

I think a lot of people in the ML community were excited for Noam Brown to lead the O series at OpenAI because intuitively, a lot of reasoning problems are highly nonlinear i.e. they have a tree-like structure. So some kind of MCTS would work well. O1/O3 don’t seem to use this, and DeepSeek explicitly mentioned difficulties training such a model.

However, I think this is coming. DeepSeek mentioned it was hard to learn a value model for MCTS from scratch, but this doesn’t mean we couldn’t seed it with some annotated data.