Comment by Herring

7 hours ago

I'd say try the nanogpt speedrun. It's much easier to train, and gives you a better comparison vs optimized systems.

https://github.com/KellerJordan/modded-nanogpt