Comment by tyfon

2 months ago

I think the biggest advantage for me with ollama is the ability to "hotswap" models with different utility instead of restarting the server with different models combined with the simple "ollama pull model". In other words, it has been quite convenient.

Due to this post I had to search a bit and it seems that llama.cpp recently got router support[1], so I need to have a look at this.

My main use for this is a discord bot where I have different models for different features like replying to messages with images/video or pure text, and non reply generation of sentiment and image descriptions. These all perform best with different models and it has been very convenient for the server to just swap in and out models on request.

[1] https://huggingface.co/blog/ggml-org/model-management-in-lla...

6 comments

tyfon

majorchord 2 months ago

> the ability to "hotswap" models with different utility instead of restarting the server

The article mentions llama-swap does this

hacker_homie 2 months ago

Llama.cpp added the ability load/switch models on demand with the max-models and models preset flags.

segmondy 2 months ago

You can do that with llama-server

ekianjo 2 months ago

Llama-server which is part of llamacpp does this for a few months now