Comment by speedgoose
2 months ago
I prefer Ollama over the suggested alternatives.
I will switch once we have good user experience on simple features.
A new model is released on HF or the Ollama registry? One `ollama pull` and it's available. It's underwhelming? `ollama rm`.
> This creates a recurring pattern on r/LocalLLaMA: new model launches, people try it through Ollama, it’s broken or slow or has botched chat templates, and the model gets blamed instead of the runtime.
Seems like maybe, at least some of the time, you’re being underwhelmed my ollama not the model.
The better performance point alone seems worth switching away
I follow the llama.cpp runtime improvements and it’s also true for this project. They may rush a bit less but you also have to wait for a few days after a model release to get a working runtime with most features.
Model authors are welcome to add support to llama.cpp before release like IBM did for granite 4 https://github.com/ggml-org/llama.cpp/pull/13550
`wget https://huggingface.co/[USER]/[REPO]/resolve/main/[FILE_NAME...`
`rm [FILE_NAME]`
With Ollama, the initial one-time setup is a little easier, and the CLI is useful, but is it worth dysfunctional templates, worse performance, and the other issues? Not to me.
Jinja templates are very common, and Jinja is not always losslessly convertible to the Go template syntax expected by Ollama. This means that some models simply cannot work correctly with Ollama. Sometimes the effects of this incompatibility are subtle and unpredictable.
you can pull directly from huggingface with llama.cpp, and it also has a decent web chat included
Does it have a model registry with an API and hot swapping or you still have to use sometime like llama swap as suggested in the article ? Or is it CLI?
You can have multiple models served now with loading/unloading with just the server binary.
https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...
1 reply →
You have no idea what you are downloading with such a pull. At least LMstudio gives you access to all the different versions of the same model.
https://ollama.com/library/gemma4/tags
I see quite a few versions, and I can also use hugging face models.