← Back to context

Comment by natolambert

1 year ago

As the other commenter said, R1 required very standard RLHF techniques too. But a fun way to think about it is that reasoning models are going to be bigger and uplift the RLHF boat.

But we need a few years to establish basics before I can write a cumulative RL for LLMs book ;)

This is a GREAT book, if you decide to write it in a rolling fashion you'd have at least one reader from the start :)