← Back to context

Comment by clarionbell

6 hours ago

Why is anyone still using this? You can spin up llama.cpp server and have more optimized runtime. And if you insist on containers you can go for ramallama https://ramalama.ai/

Ollama is a more out of the box solution. I also prefer llama.cpp for the more FOSS aspects, but Ollama is a simpler install, model download (this is the biggest convenience IMO), and execution. For those reasons, that's why I believe it's still fairly popular as a solution.

  • By the way, you can download models straight from hugging face with llama.cpp. It might be a few characters longer than the command you would run on ollama, but still.

    • Then you need to also provide appropriate metadata and format messages correctly according to the format. Which I believe llama.cpp doesn’t do by default, or it can do it? I had trouble formatting messages correctly using llama.cpp due to possibly mismatch in metadata, which ollama seems to handle, but would love to know if this is wrong.

      1 reply →

  • I disagree that Ollama is easier to install. I tried to enable Vulkan on Ollama and it is nightmarish, even though the underlying llama.cpp code supports it with a simple envar. Ollama was easy 2 years ago, but has been progressively getting worse over time.

I think people just don't know any better. I also used Ollama way longer than I should have. I didn't know that Ollama was just llama.cpp with a thin wrapper. My quality of life improved a lot after I discovered llama.cpp.