Comment by tyfon
2 months ago
I think the biggest advantage for me with ollama is the ability to "hotswap" models with different utility instead of restarting the server with different models combined with the simple "ollama pull model". In other words, it has been quite convenient.
Due to this post I had to search a bit and it seems that llama.cpp recently got router support[1], so I need to have a look at this.
My main use for this is a discord bot where I have different models for different features like replying to messages with images/video or pure text, and non reply generation of sentiment and image descriptions. These all perform best with different models and it has been very convenient for the server to just swap in and out models on request.
[1] https://huggingface.co/blog/ggml-org/model-management-in-lla...
> the ability to "hotswap" models with different utility instead of restarting the server
The article mentions llama-swap does this
Llama.cpp added the ability load/switch models on demand with the max-models and models preset flags.
You can do that with llama-server
Llama-server which is part of llamacpp does this for a few months now