Comment by ACCount37
6 hours ago
It's a useful exercise. A lot of the good ML work is first validated at small scale.
And this new example goes even further - adds instruction following and tool use SFT, as well as RLVR. Makes for a more useful baseline.
No comments yet
Contribute on Hacker News ↗