Comment by dabockster
3 hours ago
The title is extremely misleading - you have to rent time on an H100 cluster to get it to work. It is not on-device, and thus not truly $100.
I was really excited, too, until I looked through the readme files and the code.
It's about training a model from scratch for $100.
I feel same. The title looks like I could have on-deivce ChatGPT with $100 forever. I couldn't imagine it's about training the model by myself.
Since the resulting model is only ~561M parameters you could run it on a Raspberry Pi that costs less than $100.
What's misleading about that? You rent $100 of time on an H100 to train the model.