← Back to context Comment by dofm 3 days ago Yes.-hfd for the draft model. 2 comments dofm Reply c-hendricks 3 days ago Nice, was wondering if there was a flag for the draft as well.Not knocking huggingface-cli, just find it's much easier for people to try out this stuff when they can just mise use --global github:ggml-org/llama.cpp LLAMA_CACHE="models" llama-server \ -hf unsloth/gemma-4-26B-A4B-it-qat-GGUF:UD-Q4_K_XL \ --host 0.0.0.0 \ --port 11434 \ ... dofm 2 days ago —no-mmproj is also pretty useful if you're doing this just to try agentic coding and you're not processing images/voice. Stops it downloading the multimodal projector.
c-hendricks 3 days ago Nice, was wondering if there was a flag for the draft as well.Not knocking huggingface-cli, just find it's much easier for people to try out this stuff when they can just mise use --global github:ggml-org/llama.cpp LLAMA_CACHE="models" llama-server \ -hf unsloth/gemma-4-26B-A4B-it-qat-GGUF:UD-Q4_K_XL \ --host 0.0.0.0 \ --port 11434 \ ... dofm 2 days ago —no-mmproj is also pretty useful if you're doing this just to try agentic coding and you're not processing images/voice. Stops it downloading the multimodal projector.
dofm 2 days ago —no-mmproj is also pretty useful if you're doing this just to try agentic coding and you're not processing images/voice. Stops it downloading the multimodal projector.
Nice, was wondering if there was a flag for the draft as well.
Not knocking huggingface-cli, just find it's much easier for people to try out this stuff when they can just
is also pretty useful if you're doing this just to try agentic coding and you're not processing images/voice. Stops it downloading the multimodal projector.