Comment by whimsicalism
1 month ago
they’re plateauing on pretraining returns, quite possibly (if rumors are to be trusted)… but they are just getting more sophisticated at real world complex RL - which is still similar to throwing more tokens at the problem and is creating large returns.
i feel that the current artifact is already quite close to something that can operate in a competent manner if the downstream RL matches the task of interest well enough
No comments yet
Contribute on Hacker News ↗