Comment by kennywinker

2 months ago

> This creates a recurring pattern on r/LocalLLaMA: new model launches, people try it through Ollama, it’s broken or slow or has botched chat templates, and the model gets blamed instead of the runtime.

Seems like maybe, at least some of the time, you’re being underwhelmed my ollama not the model.

The better performance point alone seems worth switching away

2 comments

kennywinker

speedgoose 2 months ago

I follow the llama.cpp runtime improvements and it’s also true for this project. They may rush a bit less but you also have to wait for a few days after a model release to get a working runtime with most features.

Maxious 2 months ago

Model authors are welcome to add support to llama.cpp before release like IBM did for granite 4 https://github.com/ggml-org/llama.cpp/pull/13550