← Back to context

Comment by saagarjha

6 days ago

mmap is not free. It just moves bandwidth around.

Using mmap for model parameters allows you to run vastly larger models for any given amount of system RAM. It's especially worthwhile when you're running MoE models and parameters for unused "experts" can just be evicted from RAM, leaving room for more relevant data. But of course this applies more generally to, e.g. single model layers, etc.