← Back to context Comment by antinomicus 11 hours ago Isn’t the whole point to run your model locally? 4 comments antinomicus Reply theptip 11 hours ago No, that’s clearly not a goal of this project.This is a learning tool. If you want a local model you are almost certainly better using something trained on far more compute. (Deepseek, Qwen, etc) yorwba 11 hours ago The 80 GB are for training with a batch size of 32 times 2048 tokens each. Since the model has only about 560M parameters, you could probably run it on CPU, if a bit slow. simonw 10 hours ago You can run a model locally on much less expensive hardware. It's training that requires the really big GPUs. jsight 10 hours ago I'd guess that this will output faster than the average reader can read, even while using only CPU inferencing on a modern-ish CPU.The param count is small enough that even cheap (<$500) GPUs would work too.
theptip 11 hours ago No, that’s clearly not a goal of this project.This is a learning tool. If you want a local model you are almost certainly better using something trained on far more compute. (Deepseek, Qwen, etc)
yorwba 11 hours ago The 80 GB are for training with a batch size of 32 times 2048 tokens each. Since the model has only about 560M parameters, you could probably run it on CPU, if a bit slow.
simonw 10 hours ago You can run a model locally on much less expensive hardware. It's training that requires the really big GPUs.
jsight 10 hours ago I'd guess that this will output faster than the average reader can read, even while using only CPU inferencing on a modern-ish CPU.The param count is small enough that even cheap (<$500) GPUs would work too.
No, that’s clearly not a goal of this project.
This is a learning tool. If you want a local model you are almost certainly better using something trained on far more compute. (Deepseek, Qwen, etc)
The 80 GB are for training with a batch size of 32 times 2048 tokens each. Since the model has only about 560M parameters, you could probably run it on CPU, if a bit slow.
You can run a model locally on much less expensive hardware. It's training that requires the really big GPUs.
I'd guess that this will output faster than the average reader can read, even while using only CPU inferencing on a modern-ish CPU.
The param count is small enough that even cheap (<$500) GPUs would work too.