Comment by nyrikki
2 hours ago
The tooling for that exists today in Linux, and it is fairly easy to use with podman etc.
K8s choices clouds that a little, but for vscode completions as an example, I have a pod, that systemd launches on request that starts it.
I have nginx receive the socket from systemd, and it communicates to llama.cpp through a socket on a shared volume. As nginx inherits the socket from systemd it does have internet access either.
If I need a new model I just download it to a shared volume.
Llama.cpp has now internet access at all, and is usable on an old 7700k + 1080ti.
People thinking that the k8s concept of a pod, with shared UTC, net, and IPC namespaces is all a pod can be confuses the issue.
The same unshare command that runc uses is very similar to how clone() drops the parent’s IPC etc…
I should probably spin up a blog on how to do this as I think it is the way forward even for long lived services.
The information is out there but scattered.
If it is something people would find useful please leave a comment.
This sounds very interesting to me. I'd read through that blog post, as I'm working on expanding my K8s skills - as you say knowledge is very scattered!