Comment by tarruda

6 months ago

> every time they needed support for a new model and had to update llama.cpp, an old model would break and one of their partners would go ape on them. They said it happened more than once, but one particular case (wish I could remember what it was) was so bad they felt they had no choice but to reimplement. It's the lowest risk strategy.

A much lower risk strategy would be using multiple versions of llama-server to keep supporting old models that would break on newer llama.cpp versions.

5 comments

tarruda

MarkSweep 6 months ago

The Ollama distribution size is already pretty big (at least on Windows) due to all the GPU support libraries and whatnot. Having to multiple that by the number of llama.cpp versions supported would not be great.

jychang 6 months ago
?
llamacpp> ls -l \*llama\* -rwxr-xr-x 1 root root 2505480 Aug 7 05:06 libllama.so -rwxr-xr-x 1 root root 5092024 Aug 7 05:23 llama-server
That's a terrible excuse, Llama.cpp is just 7.5 megabytes. You can easily ship a couple copies of that. The current ollama for windows download is 700MB.
I don't buy it. They're not willing to make an 700MB download a few megabytes bigger to ~730MB, but they are willing to support a fork/rewrite indefinitely (and the fork is outside of their core competency, as seen by the current issues)? What kind of decisionmaking is that?
- MarkSweep 6 months ago
  
  Sorry, I forgot to include in my comment this part:
  If you include multiple version of llama, and each of those llama version depends on different GPU libraries, that could balloon the download size.
  If these GPU libraries change rarely, then yes, you are correct, it might not be a problem.
  
  1 reply →
- vlovich123 6 months ago
  
  It’s 700mib because they’re likely redistributing the CUDA libraries so that users don’t need to separately run that installer. Llama.cpp is a bit more “you are expected to know what you’re doing” on that front. But yeah, you could plausibly ship multiple versions of the inference engine although from a maintenance perspective that sounds like hell for any number of reasons