Comment by tarruda

8 days ago

Llama.cpp (library which ollama uses under the hoods) has its own server, and it is fully compatible with open-webui.

I moved away from ollama in favor of llama-server a couple of months ago and never missed anything, since I'm still using the same UI.

22 comments

tarruda

mchiang 8 days ago

totally respect your choice, and it's a great project too. Of course as a maintainer of Ollama, my preference is to win you over with Ollama. If it doesn't meet your needs, it's okay. We are more energized than ever to keep improving Ollama. Hopefully one day we will win you back.

Ollama does not use llama.cpp anymore; we do still keep it and occasionally update it to remain compatible for older models for when we used it. The team is great, we just have features we want to build, and want to implement the models directly in Ollama. (We do use GGML and ask partners to help it. This is a project that also powers llama.cpp and is maintained by that same team)

am17an 7 days ago

I’ve never seen a PR on ggml from Ollama folks though. Could you mention one contribution you did?
kristjansson 7 days ago
> Ollama does not use llama.cpp anymore;
> We do use GGML
Sorry, but this is kind of hiding the ball. You don't use llama.cpp, you just ... use their core library that implements all the difficult bits, and carry a patchset on top of it?
Why do you have to start with the first statement at all? "we use the core library from llama.cpp/ggml and implement what we think is a better interface and UX. we hope you like it and find it useful."
- mchiang 7 days ago
  
  thanks, I'll take that feedback, but I do want to clarify that it's not from llama.cpp/ggml. It's from ggml-org/ggml. I supposed it's all interchangeable though, so thank you for it.
  
  4 replies →
- cortesoft 7 days ago
  
  Why are you being so accusatory about a choice about which details are important?
tarruda 8 days ago
> Ollama does not use llama.cpp anymore
That is interesting, did Ollama develop its own proprietary inference engine or did you move to something else?
Any specific reason why you moved away from llama.cpp?
- mchiang 8 days ago
  
  it's all open, and specifically, the new models are implemented here: https://github.com/ollama/ollama/tree/main/model/models
daft_pink 7 days ago
So I’m using turbo and just want to provide some feedback. I can’t figure out how to connect raycast and project goose to ollama turbo. The software that calls it essentially looks for the models via ollama but cannot find the turbo ones and the documentation is not clear yet. Just my two cents, the inference is very quick and I’m happy with the speed but not quite usable yet.
- mchiang 7 days ago
  
  so sorry about this. We are learning. Possible to email, and we will first make it right while we improve Ollama's turbo mode. hello@ollama.com
  
  1 reply →

halJordan 8 days ago

Fully compatible is a stretch, it's important we dont fall into a celebrity "my guy is perfect" trap. They implement a few endpoints.

jychang 8 days ago

They implement more openai-compatible endpoints than ollama at least

benreesman 4 days ago

I won't use `ollama` on principle. I use `llama-cli` and `llama-server` if I'm not linking `ggml`/`gguf` directly. It's like, two extra commands to use the one by the genius that wrote it and not the one that the guys just jacked it.

The models are on HuggingFace and downloading them is `uvx huggingface-cli`, the `GGUF` quants were `TheBloke` (with a grant from pmarca IIRC) for ages and now everyone does them (`unsloth` does a bunch of them).

Maybe I've got it twisted, but it seems to be that the people who actually do `ggml` aren't happy about it, and I've got their back on this.

om8 8 days ago

It’s unfortunate that llama.cpp’s code is a mess. It’s impossible to make any meaningful contributions to it.

kristjansson 8 days ago

I'm the first to admit I'm not a heavy C++ user, so I'm not a great judge of the quality looking at the code itself ... but ggml-org has 400 contributors on ggml, 1200 on llama.cpp and has kept pace with ~all major innovations in transformers over the last year and change. Clearly some people can and do make meaningful contributions.

A4ET8a8uTh0_v2 8 days ago

Interesting, admittedly, I am slowly getting to the point, where ollama's defaults get a little restrictive. If the setup is not too onerous, I would not mind trying. Where did you start?

tarruda 8 days ago
Download llama-server from llama.cpp Github and install it some PATH directory. AFAIK they don't have an automated installer, so that can be intimidating to some people
Assuming you have llama-server installed, you can download + run a hugging face model with something like
llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja
And access http://localhost:8080

theshrike79 7 days ago

Isn't the open-webui maintainer heavily against MCP support and tool calling?