Comment by ineedasername
3 hours ago
That linked article says its about RLVR but then goes on to conflate other RL with it, and doesn't address much in the way of the core thinking that was in the paper they were partially responding to that had been published a month earlier[0] which laid out findings and theory reasonably well, including work that runs counter to the main criticism in the article you cited, ie, performance at or above base models only being observed with low K examples.
That said, reachability and novel strategies are somewhat overlapping areas of consideration, and I don't see many ways in which RL in general, as mainly practiced, improves upon models' reachability. And even when it isn't clipping weights it's just too much of a black box approach.
But none of this takes away from the question of raw model capability on novel strategies, only such with respect to RL.
No comments yet
Contribute on Hacker News ↗