Comment by turnsout

6 months ago

This is going to sound like a troll, but it's an honest question: Why do people use Ollama over llama.cpp? llama.cpp has added a ton of features, is about as user-friendly as Ollama, and is higher-performance. Is there some key differentiator for Ollama that I'm missing?

29 comments

turnsout

SkyPuncher 6 months ago

Ollama - `brew install ollama`

llama.cpp - Read the docs, with loads of information and unclear use cases. Question if it has API compatibility and secondary features that a bunch of tools expect. Decide it's not worth your effort when `ollama` is already running by the time you've read the docs

kgwgk 6 months ago

https://formulae.brew.sh/formula/llama.cpp
LorenDB 6 months ago
Additionally, Ollama makes model installation a single command. With llama.cpp, you have to download the raw models from Huggingface and handle storage for them yourself.
- trissi1996 6 months ago
  
  Not really, llama.cpp can download for quite some time, not as elegant as ollama but:
  llama-server --model-url "https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF/resolve/main/DeepSeek-R1-Distill-Qwen-32B-IQ4_XS.gguf"
  Will get you up and running in one single command.
  
  1 reply →
singularity2001 6 months ago

ollama run deepseek-r1:14b

portaouflop 6 months ago

I can only speak for myself but to me llama.ccp looks kind of hard to use (tbh never tried to use it), whereas ollama was just one cli command away. Also I had no idea that its equivalent, I thought llama.ccp is some experimental tool for hardcore llm cracks, not something that I can teach my for example my non-technical mom to use.

Looking at the repo of llama.ccp it’s still not obvious to me how to use it without digging in - I need to download models from huggingface it seems and configure stuff etc - with ollama I type ollama get or something and it works.

Tbh I don’t just that stuff a lot or even seriously, maybe once per month to try out new local models.

I think having an easy to use quickstart would go a long way for llama.ccp - but maybe it’s not intended for casual (stupid?) users like me…

Majromax 6 months ago

In my mind, it doesn't help that llama.cpp's name is that of a source file. Intuitively, that name screams "library for further integration," not "tool for end-user use."

n144q 6 months ago

https://news.ycombinator.com/item?id=40693391

(I recommend doing a search yourself first)

Basically, if you know how to use a computer, you can use Ollama (almost). You can't say the same thing about llama.cpp. Not everyone knows how to build from source, or even what "build" means.

paradite 6 months ago

For starters:

- It doesn't have a website

- It doesn't have a download page, you have to build it yourself

woadwarrior01 6 months ago
> - It doesn't have a download page, you have to build it yourself
I'd wager that anyone capable enough to run a command line tool like Ollama should also be able to download prebuilt binaries from the llama.cpp releases page[1]. Also, prebuilt binaries are available on things like homebrew[2].
[1]: https://github.com/ggerganov/llama.cpp/releases
[2]: https://formulae.brew.sh/formula/llama.cpp
- a12k 6 months ago
  
  I am very technically inclined and use Ollama (in a VM, but still) because of all the steps and non-obviousness of how to run Llama.cpp. This framing feels a bit like the “Dropbox won’t succeed because rsync is easy” thinking.
  
  4 replies →
- n144q 6 months ago
  
  And you still need to find and download the model files yourself, among other steps, which is intimidating enough to drive away most users, including skilled software engineers. Most people just want it to work and start using it for something else as soon as possible.
  The same reason I use apt install instead of compiling from source. I can definitely do that, but I don't, because it's just a way to get things installed.
- paradite 6 months ago
  
  Ok I was looking at the repo from mobile and missed the releases.
  Still it's not immediate obvious from README that there is an option to download it. There are instructions on how to build it, but not how to download it. Or maybe I'm blind, please correct me.
- baq 6 months ago
  
  I'm perfectly capable of compiling my own software but why bother if I can curl | sh into ollama.

mrkeen 6 months ago

I used both. I had a terrible time with llama, and did not realise it until I used ollama.

I owned an RTX2070, and followed the llama instructions to make sure it was compiling with GPU enabled. I then hand-tweaked settings (numgpulayers) to try to make it offload as much as possible to the GPU. I verified that it was using a good chunk of my GPU ram (via nvidia-smi), and confirmed that with-gpu was faster than cpu-only. It was still pretty slow, and influenced my decision to upgrade to an RTX3070. It was faster, but still pretty meh...

The first time I used ollama, everything just worked straight out of the box, with one command and zero configuration. It was lightning fast. Honestly if I'd had ollama earlier, I probably wouldn't have felt the need to upgrade GPU.

serial_dev 6 months ago

Maybe it was lightning fast because the model names are misleading? I installed it to try out deepseek, I was surprised how small the download artifact was and how easily it ran on my simple 3 years old Mac. I was a bit disappointed as deepseek gave bad responses and I heard it should be better than what I used on OpenAI… only to then realize after reading it on Twitter that I got a very small version of deepseek r1.
Maybe you were running a different model?
bildung 6 months ago

If it was faster with ollama, then you most probably just downloaded a different model (hard to recognize with ollama). Ollama only adds UX to llama.cpp, and nothing compute-wise.

stuaxo 6 months ago

The server in llama-cpp is documented as being only for demonstration, but ollama supports it as a model to run it.

For work, we are given Macs and so the GPU can't be passed through to docker.

I wanted a client/server where the server has the LLM and runs outside of Docker, but without me having to write the client/server part.

I run my model in ollama, then inside the code use litellm to speak to it during local development.

rakatata 6 months ago

While not rocketscience, a lot of its features requires to know how to recompile the project with passing certain variables. Also you need to properly format prompts for each instructor model.

buyucu 6 months ago

I use ollama because llama.cpp dropped support for vlms. I would happily switch back if llama.cpp starts supporting vlms again.

dinosaurdynasty 6 months ago

Can you even use bare llama.cpp with OpenWebUI? Especially when they are running on two different computers?

zophiana 6 months ago

Honestly I just didn't know it was this easy to use, maybe because of the name... But ramalama seems to be a full replacement for ollama

himhckr 6 months ago
ramalama still needs users to be able to install docker first, no? That’s a barrier to entry for many users esp. Windows where I have had my struggles running Docker not to mention a massive resource hog.
- sroecker 6 months ago
  
  Yes, but ramalama defaults to podman. Podman Desktop is very easy to install and use.