I suspect Ollama is at least partly moving away open source as they look to raise capitol, when they released their replacement desktop app they did so as closed source. You're absolutely right that people should be using llama.cpp - not only is it truly open source but it's significantly faster, has better model support, many more features, better maintained and the development community is far more active.
Only issue I have found with llama.cpp is trying to get it working with my amd GPU. Ollama almost works out of the box, in docker and directly on my Linux box.
ik_llama is almost always faster when tuned. However, when untuned I've found them to be very similar in performance with varied results as to which will perform better.
But vLLM and Sglang tend to be faster than both of those.
I suspect Ollama is at least partly moving away open source as they look to raise capitol, when they released their replacement desktop app they did so as closed source. You're absolutely right that people should be using llama.cpp - not only is it truly open source but it's significantly faster, has better model support, many more features, better maintained and the development community is far more active.
Only issue I have found with llama.cpp is trying to get it working with my amd GPU. Ollama almost works out of the box, in docker and directly on my Linux box.
>Only issue I have found with llama.cpp is trying to get it working with my amd GPU.
I had no problems with ROCm 6.x but couldn't get it to run with ROCm 7.x. I switched to Vulkan and the performance seems ok for my use cases
Desktop app is open-source now.
> but people should use llama.cpp instead
MLX is a lot more performant than Ollama and llama.cpp on Apple Silicon, comparing both peak memory usage + tok/s output.
edit: LM Studio benefits from MLX optimizations when running MLX compatible models.
> LMStudio is not open source though, ollama is
and why should that affect usage? it's not like ollama users fork the repo before installing it.
It was worth mentioning.
Note that there's also "LlamaBarn" (macOS app): https://github.com/ggml-org/LlamaBarn
Ollama did not open source their GUI.
The source is available here: https://github.com/ollama/ollama/tree/main/app
Thanks, I stand corrected.
ik_llama is almost always faster when tuned. However, when untuned I've found them to be very similar in performance with varied results as to which will perform better.
But vLLM and Sglang tend to be faster than both of those.
Besides optimizations specific to running locally lands in lamma.cpp first.