← Back to context

Comment by skolos

5 hours ago

I'd say adding another 16Gb gpu would be worth it - you'd be able to run larger model/larger context all within gpu's. It would give you more options of what you can run fast. Your current model probably doesn't run completely from GPU (depending on quants I don't think you can squeeze Gemma4:26b into 16Gb vram), so you already have some layers running on gpu and some on cpu. If you add another gpu you might be able to move all layers to vram which should speed up things for you. The layers calculations happen on whatever gpu's it sits, so the layers that are already on your rtx5080 would compute same, but the layers that currently your cpu handles will be computed with faster vram/compute of rtx5060.