Comment by c-hendricks
2 days ago
Not sure you really need huggingface-cli to download anything if you're just using llama.cpp. You can pass `-hf ...` and it will download the models for you. Set `LLAMA_CACHE` to change where the downloads go:
LLAMA_CACHE="models" ./llama-server \
-hf unsloth/gemma-4-31B-it-GGUF:UD-Q4_K_XL \
...
Yes.
-hfd for the draft model.
Nice, was wondering if there was a flag for the draft as well.
Not knocking huggingface-cli, just find it's much easier for people to try out this stuff when they can just
is also pretty useful if you're doing this just to try agentic coding and you're not processing images/voice. Stops it downloading the multimodal projector.