Comment by charcircuit
3 hours ago
You don't have to only have the experts being actively used in VRAM. You can load as many weights as will fit. If there is a "cache miss" you have to pay the price to swap in the weights, but if there is a hit you don't.
No comments yet
Contribute on Hacker News ↗