Comment by sammyd56
6 hours ago
I've uploaded the model here: https://huggingface.co/sdobson/nanochat
I didn't get as good results as Karpathy (unlucky seed?)
It's fun to play with though...
User: How many legs does a dog have? Assistant: That's a great question that has been debated by dog enthusiasts for centuries. There's no one "right" answer (...)
I got your model working on CPU on macOS by having Claude Code hack away furiously for a while. Here's a script that should work for anyone: https://gist.github.com/simonw/912623bf00d6c13cc0211508969a1...
You can run it like this:
This is a much easier way to run the model. I'm going to update the huggingface README to point to this. The one thing that could be improved is the turn-taking between user and assistant, which it sometimes gets confused about. I fixed that in my fork of your gist here: https://gist.github.com/samdobson/975c8b095a71bbdf1488987eac...
Simon, I had to run "brew install git-lfs && cd nano-chat && git lfs install && git lfs pull" and then it worked. before then, the model weights didn't get cloned by default for me on macOS.
% uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0... \ --model-dir nanochat/ --prompt "who is simonw on hacker news?" Using device: cpu Loading model from nanochat/model_000650.pt Loading metadata from nanochat/meta_000650.json Model config: {'sequence_len': 2048, 'vocab_size': 65536, 'n_layer': 20, 'n_head': 10, 'n_kv_head': 10, 'n_embd': 1280} Loading model weights (this may take a minute for a 2GB model)... Converting model to float32 for CPU... Model loaded successfully! Loading tokenizer... Tokenizer loaded successfully!
Prompt: who is simonw on hacker news? Encoded to 9 tokens
Generating... -------------------------------------------------- who is simonw on hacker news?<|user_end|><|assistant_start|>A hacker news reporter, I'd say a few things. First, I'm a bit of a hothead, always pushing the boundaries of what's acceptable in the world of hacking. I've got a reputation for being merciless and relentless in my pursuit of the truth.
In many ways, I've developed a sixth sense for this type of thing. I've spent years honing my skills, learning the language of hacking and the tactics it takes. I know how to think like the hacker --------------------------------------------------
Adding on: Claude also gave me the following line which was necessary to get the model weights to download from HF. This might be obvious for anyone familiar with HF but it helped me so sharing here!
git lfs install
For anyone curious this is the error when running uv sync on macos,
> uv sync Resolved 88 packages in 3ms error: Distribution `torch==2.8.0+cu128 @ registry+https://download.pytorch.org/whl/cu128` can't be installed because it doesn't have a source distribution or wheel for the current platform
hint: You're on macOS (`macosx_15_0_arm64`), but `torch` (v2.8.0+cu128) only has wheels for the following platforms: `manylinux_2_28_x86_64`, `win_amd64`; consider adding your platform to `tool.uv.required-environments` to ensure uv resolves to a version with compatible wheels
Also, tmp/nanochat expects all contents from tokenizer and chatsft_checkpoints folder.