Comment by a96
15 hours ago
I guess that technically depends on the software used to run the model, but in general it's always been possible to run on a CPU (and may even be possible to run on TPU or something else). It's just been slower. Likewise GPU RAM vs system RAM and the bandwidths involved can make hard bottlenecks.
GPU and VRAM (or fast unified RAM) is generally the option that is both available and performant, but especially really small models also run quite well on CPU and system RAM.
No comments yet
Contribute on Hacker News ↗