Comment by hodgehog11
2 days ago
The Linux kernel does not break userspace.
> What's wrong with using an older well-tested build of llama.cpp, instead of reinventing the wheel?
Yeah, they tried this, this was the old setup as I understand it. But every time they needed support for a new model and had to update llama.cpp, an old model would break and one of their partners would go ape on them. They said it happened more than once, but one particular case (wish I could remember what it was) was so bad they felt they had no choice but to reimplement. It's the lowest risk strategy.
> Yeah, they tried this, this was the old setup as I understand it. But every time they needed support for a new model and had to update llama.cpp, an old model would break and one of their partners would go ape on them.
Shouldn't any such regressions be regarded as bugs in llama.cpp and fixed there? Surely the Ollama folks can test and benchmark the main models that people care about before shipping the update in a stable release. That would be a lot easier than trying to reimplement major parts of llama.cpp from scratch.
> every time they needed support for a new model and had to update llama.cpp, an old model would break and one of their partners would go ape on them. They said it happened more than once, but one particular case (wish I could remember what it was) was so bad they felt they had no choice but to reimplement. It's the lowest risk strategy.
A much lower risk strategy would be using multiple versions of llama-server to keep supporting old models that would break on newer llama.cpp versions.
The Ollama distribution size is already pretty big (at least on Windows) due to all the GPU support libraries and whatnot. Having to multiple that by the number of llama.cpp versions supported would not be great.
?
That's a terrible excuse, Llama.cpp is just 7.5 megabytes. You can easily ship a couple copies of that. The current ollama for windows download is 700MB.
I don't buy it. They're not willing to make an 700MB download a few megabytes bigger to ~730MB, but they are willing to support a fork/rewrite indefinitely (and the fork is outside of their core competency, as seen by the current issues)? What kind of decisionmaking is that?
1 reply →