Comment by whimsicalism

6 months ago

they’re plateauing on pretraining returns, quite possibly (if rumors are to be trusted)… but they are just getting more sophisticated at real world complex RL - which is still similar to throwing more tokens at the problem and is creating large returns.

i feel that the current artifact is already quite close to something that can operate in a competent manner if the downstream RL matches the task of interest well enough