Comment by mchiang

7 days ago

totally respect your choice, and it's a great project too. Of course as a maintainer of Ollama, my preference is to win you over with Ollama. If it doesn't meet your needs, it's okay. We are more energized than ever to keep improving Ollama. Hopefully one day we will win you back.

Ollama does not use llama.cpp anymore; we do still keep it and occasionally update it to remain compatible for older models for when we used it. The team is great, we just have features we want to build, and want to implement the models directly in Ollama. (We do use GGML and ask partners to help it. This is a project that also powers llama.cpp and is maintained by that same team)

13 comments

mchiang

am17an 7 days ago

I’ve never seen a PR on ggml from Ollama folks though. Could you mention one contribution you did?

kristjansson 7 days ago

> Ollama does not use llama.cpp anymore;

> We do use GGML

Sorry, but this is kind of hiding the ball. You don't use llama.cpp, you just ... use their core library that implements all the difficult bits, and carry a patchset on top of it?

Why do you have to start with the first statement at all? "we use the core library from llama.cpp/ggml and implement what we think is a better interface and UX. we hope you like it and find it useful."

mchiang 7 days ago
thanks, I'll take that feedback, but I do want to clarify that it's not from llama.cpp/ggml. It's from ggml-org/ggml. I supposed it's all interchangeable though, so thank you for it.
- kristjansson 6 days ago
  
  % diff -ru ggml/src llama.cpp/ggml/src | grep -E '^(\+|\-) .*' | wc -l 1445
  i.e. as of time of writing +/- 1445 lines between the two, on about 175k total lines. a lot of which is the recent MXFP4 stuff.
  Ollama is great software. It's integral to the broader diffusion of LLMs. You guys should be incredibly proud of it and the impact its had. I understand the current environment rewards bold claims, but the sense I get from some of your communications is "what's the boldest, strongest claim we can make that's still mostly technically true". As a potential user, taking those claims as true until closer evaluation reveals the discrepancy feels pretty bad, and keeps me firmly in the 'potential' camp.
  Have the confidence in your software and the respect for your users to advertise your system as it is.
  
  3 replies →
cortesoft 7 days ago

Why are you being so accusatory about a choice about which details are important?

tarruda 7 days ago

> Ollama does not use llama.cpp anymore

That is interesting, did Ollama develop its own proprietary inference engine or did you move to something else?

Any specific reason why you moved away from llama.cpp?

mchiang 7 days ago

it's all open, and specifically, the new models are implemented here: https://github.com/ollama/ollama/tree/main/model/models

daft_pink 7 days ago

So I’m using turbo and just want to provide some feedback. I can’t figure out how to connect raycast and project goose to ollama turbo. The software that calls it essentially looks for the models via ollama but cannot find the turbo ones and the documentation is not clear yet. Just my two cents, the inference is very quick and I’m happy with the speed but not quite usable yet.

mchiang 7 days ago
so sorry about this. We are learning. Possible to email, and we will first make it right while we improve Ollama's turbo mode. hello@ollama.com
- daft_pink 6 days ago
  
  no worries. i totally understand that the first day something is released it doesn’t work perfectly with third party/community software.
  thanks for the feedback address :)