Comment by simonw
3 months ago
Yeah, if your goal is "build the tightest 8,000 line implementation of training an LLM from scratch, with a focus on both conciseness and educational value" I don't think it's particularly surprising that Claude/Codex weren't much help.
Now to wait for Sonnet 5 and GPT-6, and ask them to build that, and see what they come up with.
Why would you expect an improvement?
because they'll be trained on karpathy's implementation