Comment by zozbot234
4 hours ago
> hitting the SSD constantly to swap layers
Thing is, people in the local llm community are already doing that to run the largest MoE models, using mmap such that spare-RAM-as-cache is managed automatically by the OS. It's a drag on performance to be sure but still somewhat usable, if you're willing to wait for results. And it unlocks these larger models on what's effectively semi-pro if not true consumer hardware. On the enterprise side, high bandwidth NAND Flash is just around the corner and perfectly suited for storing these large read-only model parameters (no wear and tear issues with the NAND storage) while preserving RAM-like throughput.
No comments yet
Contribute on Hacker News ↗