LM Studio defaults to 12/36 layers on the GPU for that model on my machine, but you can crank it to all 36 on the GPU. That does slow it down but I'm not finding it unusable and it seems like it has some advantages - but I doubt I'm going to run it this way.
You don't. You run some of the layers on the CPU.
You're right that I was confused about that.
LM Studio defaults to 12/36 layers on the GPU for that model on my machine, but you can crank it to all 36 on the GPU. That does slow it down but I'm not finding it unusable and it seems like it has some advantages - but I doubt I'm going to run it this way.
FWIW, that's a 80GB model and you also need kv cache. You'd need 96GBish to run on the GPU.
6 replies →