Comment by llmtosser
7 days ago
This is not true.
No inference engine does all of:
- Model switching
- Unload after idle
- Dynamic layer offload to CPU to avoid OOM
7 days ago
This is not true.
No inference engine does all of:
- Model switching
- Unload after idle
- Dynamic layer offload to CPU to avoid OOM
this can be added to llama.cpp with llama.swap currently so even without Ollama you are not far off