Comment by A4ET8a8uTh0_v2

8 days ago

Interesting, admittedly, I am slowly getting to the point, where ollama's defaults get a little restrictive. If the setup is not too onerous, I would not mind trying. Where did you start?

Download llama-server from llama.cpp Github and install it some PATH directory. AFAIK they don't have an automated installer, so that can be intimidating to some people

Assuming you have llama-server installed, you can download + run a hugging face model with something like

    llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja

And access http://localhost:8080