Comment by natolambert
1 year ago
As the other commenter said, R1 required very standard RLHF techniques too. But a fun way to think about it is that reasoning models are going to be bigger and uplift the RLHF boat.
But we need a few years to establish basics before I can write a cumulative RL for LLMs book ;)
This is a GREAT book, if you decide to write it in a rolling fashion you'd have at least one reader from the start :)