Comment by faxmeyourcode
4 hours ago
Absolutely, it's wildly fun to read the outputs of even a little tiny 0.8M model trained on CPU. And now I've actually got a much better understanding of the transformer architecture after playing around with it for a day. This repo is probably going to spawn some new folks to try out ideas which will turn into new researchers in the field, no doubt.
No comments yet
Contribute on Hacker News ↗