Comment by c-hendricks

2 days ago

Not sure you really need huggingface-cli to download anything if you're just using llama.cpp. You can pass `-hf ...` and it will download the models for you. Set `LLAMA_CACHE` to change where the downloads go:

  LLAMA_CACHE="models" ./llama-server \
    -hf unsloth/gemma-4-31B-it-GGUF:UD-Q4_K_XL \
    ...

3 comments

c-hendricks

dofm 2 days ago

Yes.

-hfd for the draft model.

c-hendricks 2 days ago
Nice, was wondering if there was a flag for the draft as well.
Not knocking huggingface-cli, just find it's much easier for people to try out this stuff when they can just
mise use --global github:ggml-org/llama.cpp LLAMA_CACHE="models" llama-server \ -hf unsloth/gemma-4-26B-A4B-it-qat-GGUF:UD-Q4_K_XL \ --host 0.0.0.0 \ --port 11434 \ ...
- dofm 2 days ago
  
  —no-mmproj
  is also pretty useful if you're doing this just to try agentic coding and you're not processing images/voice. Stops it downloading the multimodal projector.