Comment by plagiarist

21 days ago

I was looking into what local inference software to use and also found this behavior with models to be onerous.

What I want is to have a directory with models and bind mount that readonly into inference containers. But Ollama would force me to either prime the pump by importing with Modelfiles (where do I even get these?) every time I start the container, or store their specific version of files?

I had trying out vLLM and llama.cpp as my next step in this, I'm glad to hear you are able to share a directory between them.

1 comment

plagiarist

embedding-shape 21 days ago

> What I want is to have a directory with models and bind mount that readonly into inference containers.

Yeah, that's basically what I'm doing, + over network (via Samba). My weights all live on a separate host, which has two Samba shares, one with write access and one read-only. The write one is mounted on my host, and the container where I run the agent mounts the read-only one (and have the source code it works on copied over to the container on boot).

The directory that LM Studio ends up creating and maintaining for the weights, works with most of the tooling I come across, except of course Ollama.