Using mmap for model parameters allows you to run vastly larger models for any given amount of system RAM. It's especially worthwhile when you're running MoE models and parameters for unused "experts" can just be evicted from RAM, leaving room for more relevant data. But of course this applies more generally to, e.g. single model layers, etc.
Using mmap for model parameters allows you to run vastly larger models for any given amount of system RAM. It's especially worthwhile when you're running MoE models and parameters for unused "experts" can just be evicted from RAM, leaving room for more relevant data. But of course this applies more generally to, e.g. single model layers, etc.