← Back to context Comment by DrBenCarson 1 month ago How are you using that RAM with the GPU? 2 comments DrBenCarson Reply canpan 1 month ago Llama.cpp with automatic offload to main memory. You can also use Ollama, it is easier, but slower. reverius42 1 month ago For those who want a GUI, LM Studio does this too (with llama.cpp as the backend I think). I'm getting great (albeit slow) results with Qwen3.6-35B MoE on 8GB GPU RAM, 40GB system RAM.
canpan 1 month ago Llama.cpp with automatic offload to main memory. You can also use Ollama, it is easier, but slower. reverius42 1 month ago For those who want a GUI, LM Studio does this too (with llama.cpp as the backend I think). I'm getting great (albeit slow) results with Qwen3.6-35B MoE on 8GB GPU RAM, 40GB system RAM.
reverius42 1 month ago For those who want a GUI, LM Studio does this too (with llama.cpp as the backend I think). I'm getting great (albeit slow) results with Qwen3.6-35B MoE on 8GB GPU RAM, 40GB system RAM.
Llama.cpp with automatic offload to main memory. You can also use Ollama, it is easier, but slower.
For those who want a GUI, LM Studio does this too (with llama.cpp as the backend I think). I'm getting great (albeit slow) results with Qwen3.6-35B MoE on 8GB GPU RAM, 40GB system RAM.