Comment by speedgoose

2 months ago

I prefer Ollama over the suggested alternatives.

I will switch once we have good user experience on simple features.

A new model is released on HF or the Ollama registry? One `ollama pull` and it's available. It's underwhelming? `ollama rm`.

10 comments

speedgoose

kennywinker 2 months ago

> This creates a recurring pattern on r/LocalLLaMA: new model launches, people try it through Ollama, it’s broken or slow or has botched chat templates, and the model gets blamed instead of the runtime.

Seems like maybe, at least some of the time, you’re being underwhelmed my ollama not the model.

The better performance point alone seems worth switching away

speedgoose 2 months ago
I follow the llama.cpp runtime improvements and it’s also true for this project. They may rush a bit less but you also have to wait for a few days after a model release to get a working runtime with most features.
- Maxious 2 months ago
  
  Model authors are welcome to add support to llama.cpp before release like IBM did for granite 4 https://github.com/ggml-org/llama.cpp/pull/13550

derrikcurran 2 months ago

`wget https://huggingface.co/[USER]/[REPO]/resolve/main/[FILE_NAME...`

`rm [FILE_NAME]`

With Ollama, the initial one-time setup is a little easier, and the CLI is useful, but is it worth dysfunctional templates, worse performance, and the other issues? Not to me.

Jinja templates are very common, and Jinja is not always losslessly convertible to the Go template syntax expected by Ollama. This means that some models simply cannot work correctly with Ollama. Sometimes the effects of this incompatibility are subtle and unpredictable.

pheggs 2 months ago

you can pull directly from huggingface with llama.cpp, and it also has a decent web chat included

speedgoose 2 months ago
Does it have a model registry with an API and hot swapping or you still have to use sometime like llama swap as suggested in the article ? Or is it CLI?
- dminik 2 months ago
  
  You can have multiple models served now with loading/unloading with just the server binary.
  https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...
  
  1 reply →

ekianjo 2 months ago

You have no idea what you are downloading with such a pull. At least LMstudio gives you access to all the different versions of the same model.

speedgoose 2 months ago

https://ollama.com/library/gemma4/tags
I see quite a few versions, and I can also use hugging face models.