Comment by LorenDB
6 months ago
Additionally, Ollama makes model installation a single command. With llama.cpp, you have to download the raw models from Huggingface and handle storage for them yourself.
6 months ago
Additionally, Ollama makes model installation a single command. With llama.cpp, you have to download the raw models from Huggingface and handle storage for them yourself.
Not really, llama.cpp can download for quite some time, not as elegant as ollama but:
Will get you up and running in one single command.
And now you need a server per model? Ollama loads models on-demand, and terminates them after idle, all accessible over the same HTTP API.