Comment by whimsicalism

10 months ago

they’re plateauing on pretraining returns, quite possibly (if rumors are to be trusted)… but they are just getting more sophisticated at real world complex RL - which is still similar to throwing more tokens at the problem and is creating large returns.

i feel that the current artifact is already quite close to something that can operate in a competent manner if the downstream RL matches the task of interest well enough