Comment by PlatoIsADisease
1 day ago
What was the original core principle of ollama?
I had used oobabooga back in the day and found ollama unnecessary.
1 day ago
What was the original core principle of ollama?
I had used oobabooga back in the day and found ollama unnecessary.
> What was the original core principle of ollama?
One decision that was/is very integral to their architecture is trying to copy how Docker handled registries and storage of blobs. Docker images have layers, so the registry could store one layer that is reused across multiple images, as one example.
Ollama did this too, but I'm unsure of why. I know the author used to work at Docker, but almost no data from weights can be shared in that way, so instead of just storing "$model-name.safetensor/.gguf" on disk, Ollama splits it up into blobs, has it's own index, and so on. For seemingly no gain except making it impossible to share weights between multiple applications.
I guess business-wise, it was easier for them to now make people use their "cloud models" so they earn money, because it's just another registry the local client connects to. But also means Ollama isn't just about running local models anymore, because that doesn't make them money, so all their focus now is on their cloud instead.
At least as a LM Studio, llama.cpp and vLLM user, I can have one directory with weights shared between all of them (granted the format of the weight works in all of them), and if I want to use Ollama, it of course can't use that same directory and will by default store things it's own way.
I was looking into what local inference software to use and also found this behavior with models to be onerous.
What I want is to have a directory with models and bind mount that readonly into inference containers. But Ollama would force me to either prime the pump by importing with Modelfiles (where do I even get these?) every time I start the container, or store their specific version of files?
I had trying out vLLM and llama.cpp as my next step in this, I'm glad to hear you are able to share a directory between them.
> What I want is to have a directory with models and bind mount that readonly into inference containers.
Yeah, that's basically what I'm doing, + over network (via Samba). My weights all live on a separate host, which has two Samba shares, one with write access and one read-only. The write one is mounted on my host, and the container where I run the agent mounts the read-only one (and have the source code it works on copied over to the container on boot).
The directory that LM Studio ends up creating and maintaining for the weights, works with most of the tooling I come across, except of course Ollama.
Ollama vs. llama.cpp is like Docker vs. FreeBSD Jails, Dropbox vs. rsync, jujutsu vs git, etc
>What was the original core principle of ollama?
Nothing, it was always going to be a rug pull. They leached off llama.cpp.
Everyone seems to be missing important piece here. Ollama is/was a one click solution for non technical person to launch a local model. It doesn’t need a lot of configuration, detects Nvidia GPU and starts model inferencing with single command. Core principle being your grandmother should be able to launch local AI model without needing to install 100 dependencies.
Exactly.
I can be in a non-technical team, and put the LLM code inside docker.
The local dev instruction is to install ollama and use it to pull the models and set some env vars.
The same code can point at bedrock when deployed there.
Using straight llamacpp at the time I wrote that it wasn't as straightforward.
1 reply →
> Ollama is/was a one click solution for non technical person to launch a local model
Maybe it is today, but initially ollama was only a cli, so obviously not for "non technical people" who would have no idea how to even use a terminal. If you hang out in the Ollama Discord (unlikely, as the mods are very ban-happy), you'd see constantly people asking for very trivial help, like how to enter commands in the terminal, and the community stringing them along, instead of just directing them to LM Desktop or something that would be much better for that type of user.