Comment by ACCount37
9 hours ago
It's a useful exercise. A lot of the good ML work is first validated at small scale.
And this new example goes even further - adds instruction following and tool use SFT, as well as RLVR. Makes for a more useful baseline.
Absolutely, it's wildly fun to read the outputs of even a little tiny 0.8M model trained on CPU. And now I've actually got a much better understanding of the transformer architecture after playing around with it for a day. This repo is probably going to spawn some new folks to try out ideas which will turn into new researchers in the field, no doubt.