← Back to context

Comment by bdbdbdb

6 hours ago

Dumb question - and I'm not trying diminish the achievement here, I just genuinely don't understand:

Why would people want to spend $200 to train a coding model when there are free coding models?

This is a great question. You definitely aren't training this to use it, you're training it to understand how things work. It's an educational project, if you're interested in experimenting with things like distributed training techniques in JAX, or preference optimisation, this gives you a minimal and hackable library to build on.

  • It's also a great base for experimentation. If you have an idea for an architecture improvement you can try it for $36 on the 20 layer nanocode setting, then for another $200 see how it holds up on the "full scale" nanocode

    Kaparthy's notes on improving nanochat [1] are one of my favorite blog-like things to read. Really neat to see which features have how much influence, and how the scaling laws evolve as you improve the architecture

    There's also modded-nanogpt which turns the same kind of experimentation into a training speedrun (and maybe loses some rigor on the way) [2]

    1 https://github.com/karpathy/nanochat/blob/master/dev/LOG.md

    2 https://github.com/kellerjordan/modded-nanogpt