← Back to context

Comment by segmondy

1 day ago

As someone that has participated in llama.cpp development, it's simple, Ollama doesn't want to give credit to llama.cpp. If llama.cpp went closed, Ollama would fall behind, they blatantly rip llama.cpp. Who cares tho? All they have to say is "powered by llama.cpp" It won't drive most users away from Ollama, most folks will prefer ollama and power users will prefer llama.cpp. But their ego won't let them.

On llama.cpp breaking things, that's the pace of innovation. It feels like a new model with a new architecture is being released every week. Guess what? The same things we saw with drivers for Unix systems back in the day, no documentation. So implementation is based on whatever can be figured from the arxiv paper, other implementations transformers/vllm (python -> C), quite often these models released from labs are "broken", jinja.template ain't easy! Bad templates will break the model generation, tool calling, agentic flow, etc. Folks will sometimes blame llama.cpp, sometimes the implementation is correct but the issue is that since it's main format is guff and anyone can generate a gguf, quite often experimental gguf is generated and released by folks excited to be the first to try a new model. Then llama.cpp gets the blame.