← Back to context Comment by at2005 9 days ago Ah, I meant that MCTS uses more inference-time compute (over GRPO) to produce a training sample 0 comments at2005 Reply No comments yet Contribute on Hacker News ↗
No comments yet
Contribute on Hacker News ↗