Comment by zozbot234
8 hours ago
Your 120B model likely has way more active parameters, so it can probably only fit a few shared layers in the VRAM for your dGPU. You might be better off running that model on a unified memory platform, slower VRAM but a lot more of it.
No comments yet
Contribute on Hacker News ↗