Comment by Herring
1 month ago
Note I didn't say Karpathy's nanoGPT, I said use the speedrun.
Transformers are universal function approximators. When well-tuned, they often start to approximate other innovations. Not always, thank god, but often enough that you have to be careful.
ok, thanks. I am taking it slow then