Comment by cdoern
17 hours ago
> Ollama could make its life much easier by spawning llama-server as a subprocess listening on a unix socket, and forward requests to it
I'd recommend taking a look at https://github.com/containers/ramalama its more similar to what you're describing in the way it uses llama-server, also it is container native by default which is nice for portability.
No comments yet
Contribute on Hacker News ↗