Comment by dabockster

6 hours ago

The title is extremely misleading - you have to rent time on an H100 cluster to get it to work. It is not on-device, and thus not truly $100.

I was really excited, too, until I looked through the readme files and the code.

The title is saying you can train your own model for $100. That part is true: the $100 goes to the cloud provider to rent you $250k of hardware for four hours. Then you can run that model on whatever hardware you have lying around, because it's really small.

What's misleading about that? You rent $100 of time on an H100 to train the model.

I feel same. The title looks like I could have on-deivce ChatGPT with $100 forever. I couldn't imagine it's about training the model by myself.

  • Since the resulting model is only ~561M parameters you could run it on a Raspberry Pi that costs less than $100.