Comment by dabockster
6 hours ago
The title is extremely misleading - you have to rent time on an H100 cluster to get it to work. It is not on-device, and thus not truly $100.
I was really excited, too, until I looked through the readme files and the code.
The title is saying you can train your own model for $100. That part is true: the $100 goes to the cloud provider to rent you $250k of hardware for four hours. Then you can run that model on whatever hardware you have lying around, because it's really small.
It's about training a model from scratch for $100.
What's misleading about that? You rent $100 of time on an H100 to train the model.
I feel same. The title looks like I could have on-deivce ChatGPT with $100 forever. I couldn't imagine it's about training the model by myself.
Since the resulting model is only ~561M parameters you could run it on a Raspberry Pi that costs less than $100.