Comment by tuned
20 days ago
thanks for linking.
Yes the paper compares the new architecture (that is also a fork of my implementation of nanoGPT) with Karpathy's nanoGPT. There are also links to the code and bench used.
20 days ago
thanks for linking.
Yes the paper compares the new architecture (that is also a fork of my implementation of nanoGPT) with Karpathy's nanoGPT. There are also links to the code and bench used.
Note I didn't say Karpathy's nanoGPT, I said use the speedrun.
Transformers are universal function approximators. When well-tuned, they often start to approximate other innovations. Not always, thank god, but often enough that you have to be careful.
ok, thanks. I am taking it slow then