Comment by lossolo
9 months ago
It's because there is nothing novel here from an architectural point of view. Again, the secret sauce is only in the training data.
O1 seems like a variant of RLRF https://arxiv.org/abs/2403.14238
Soon you will see similar models from competitors.
No comments yet
Contribute on Hacker News ↗