Comment by cdoern

17 hours ago

> Ollama could make its life much easier by spawning llama-server as a subprocess listening on a unix socket, and forward requests to it

I'd recommend taking a look at https://github.com/containers/ramalama its more similar to what you're describing in the way it uses llama-server, also it is container native by default which is nice for portability.