← Back to context

Comment by jjice

1 day ago

Ollama is a more out of the box solution. I also prefer llama.cpp for the more FOSS aspects, but Ollama is a simpler install, model download (this is the biggest convenience IMO), and execution. For those reasons, that's why I believe it's still fairly popular as a solution.

By the way, you can download models straight from hugging face with llama.cpp. It might be a few characters longer than the command you would run on ollama, but still.

  • Then you need to also provide appropriate metadata and format messages correctly according to the format. Which I believe llama.cpp doesn’t do by default, or it can do it? I had trouble formatting messages correctly using llama.cpp due to possibly mismatch in metadata, which ollama seems to handle, but would love to know if this is wrong.

    • Plus a huggingface token to access models that require you to beg for approval. Ollama hosted models don't require that (which may not be legit but most users don't care).

  • You can, but you have to know where to look, and you have to have some idea of what you're doing. The benefit of Ollama is that the barrier to entry is really low, as long as you have the right hardware.

    To me, one of the benefits of running a model locally is learning how all this stuff works, so Ollama never had any appeal. But most people just want stuff to work without putting in the effort to understand how it all fits together. Ollama meets that demand.

I disagree that Ollama is easier to install. I tried to enable Vulkan on Ollama and it is nightmarish, even though the underlying llama.cpp code supports it with a simple envar. Ollama was easy 2 years ago, but has been progressively getting worse over time.