Comment by turnsout

6 months ago

This is going to sound like a troll, but it's an honest question: Why do people use Ollama over llama.cpp? llama.cpp has added a ton of features, is about as user-friendly as Ollama, and is higher-performance. Is there some key differentiator for Ollama that I'm missing?

Ollama - `brew install ollama`

llama.cpp - Read the docs, with loads of information and unclear use cases. Question if it has API compatibility and secondary features that a bunch of tools expect. Decide it's not worth your effort when `ollama` is already running by the time you've read the docs

I can only speak for myself but to me llama.ccp looks kind of hard to use (tbh never tried to use it), whereas ollama was just one cli command away. Also I had no idea that its equivalent, I thought llama.ccp is some experimental tool for hardcore llm cracks, not something that I can teach my for example my non-technical mom to use.

Looking at the repo of llama.ccp it’s still not obvious to me how to use it without digging in - I need to download models from huggingface it seems and configure stuff etc - with ollama I type ollama get or something and it works.

Tbh I don’t just that stuff a lot or even seriously, maybe once per month to try out new local models.

I think having an easy to use quickstart would go a long way for llama.ccp - but maybe it’s not intended for casual (stupid?) users like me…

  • In my mind, it doesn't help that llama.cpp's name is that of a source file. Intuitively, that name screams "library for further integration," not "tool for end-user use."

For starters:

- It doesn't have a website

- It doesn't have a download page, you have to build it yourself

  • > - It doesn't have a download page, you have to build it yourself

    I'd wager that anyone capable enough to run a command line tool like Ollama should also be able to download prebuilt binaries from the llama.cpp releases page[1]. Also, prebuilt binaries are available on things like homebrew[2].

    [1]: https://github.com/ggerganov/llama.cpp/releases

    [2]: https://formulae.brew.sh/formula/llama.cpp

    • I am very technically inclined and use Ollama (in a VM, but still) because of all the steps and non-obviousness of how to run Llama.cpp. This framing feels a bit like the “Dropbox won’t succeed because rsync is easy” thinking.

      4 replies →

    • And you still need to find and download the model files yourself, among other steps, which is intimidating enough to drive away most users, including skilled software engineers. Most people just want it to work and start using it for something else as soon as possible.

      The same reason I use apt install instead of compiling from source. I can definitely do that, but I don't, because it's just a way to get things installed.

    • Ok I was looking at the repo from mobile and missed the releases.

      Still it's not immediate obvious from README that there is an option to download it. There are instructions on how to build it, but not how to download it. Or maybe I'm blind, please correct me.

    • I'm perfectly capable of compiling my own software but why bother if I can curl | sh into ollama.

I used both. I had a terrible time with llama, and did not realise it until I used ollama.

I owned an RTX2070, and followed the llama instructions to make sure it was compiling with GPU enabled. I then hand-tweaked settings (numgpulayers) to try to make it offload as much as possible to the GPU. I verified that it was using a good chunk of my GPU ram (via nvidia-smi), and confirmed that with-gpu was faster than cpu-only. It was still pretty slow, and influenced my decision to upgrade to an RTX3070. It was faster, but still pretty meh...

The first time I used ollama, everything just worked straight out of the box, with one command and zero configuration. It was lightning fast. Honestly if I'd had ollama earlier, I probably wouldn't have felt the need to upgrade GPU.

  • Maybe it was lightning fast because the model names are misleading? I installed it to try out deepseek, I was surprised how small the download artifact was and how easily it ran on my simple 3 years old Mac. I was a bit disappointed as deepseek gave bad responses and I heard it should be better than what I used on OpenAI… only to then realize after reading it on Twitter that I got a very small version of deepseek r1.

    Maybe you were running a different model?

  • If it was faster with ollama, then you most probably just downloaded a different model (hard to recognize with ollama). Ollama only adds UX to llama.cpp, and nothing compute-wise.

The server in llama-cpp is documented as being only for demonstration, but ollama supports it as a model to run it.

For work, we are given Macs and so the GPU can't be passed through to docker.

I wanted a client/server where the server has the LLM and runs outside of Docker, but without me having to write the client/server part.

I run my model in ollama, then inside the code use litellm to speak to it during local development.

While not rocketscience, a lot of its features requires to know how to recompile the project with passing certain variables. Also you need to properly format prompts for each instructor model.

I use ollama because llama.cpp dropped support for vlms. I would happily switch back if llama.cpp starts supporting vlms again.

Honestly I just didn't know it was this easy to use, maybe because of the name... But ramalama seems to be a full replacement for ollama

  • ramalama still needs users to be able to install docker first, no? That’s a barrier to entry for many users esp. Windows where I have had my struggles running Docker not to mention a massive resource hog.