Comment by llmtosser

7 days ago

Interesting - it does indeed seem like llama-server has the needed endpoints to do the model swapping and llama.cpp as of recently also has a new flag for the dynamic CPU offload now.

However the approach to model swapping is not 'ollama compatible' which means all the OSS tools supporting 'ollama' Ex Openwebui, Openhands, Bolt.diy, n8n, flowise, browser-use etc.. aren't able to take advantage of this particularly useful capability as best I can tell.