Comment by mschwaig
6 months ago
Ollama tries to appeal to a lowest common denominator user base, who does not want to worry about stuff like configuration and quants, or which binary to download.
I think they want their project to be smart enough to just 'figure out what to do' on behalf of the user.
That appeals to a lot of people, but I think them stuffing all backends into one binary and auto-detecting at runtime which to use and is actually a step too far towards simplicity.
What they did to support both CUDA and ROCm using the same binary looked quite cursed last time I checked (because they needed to link or invoke two different builds of llama.cpp of course).
I have only glanced at that PR, but I'm guessing that this plays a role in how many backends they can reasonably try to support.
In nixpkgs it's a huge pain that we configure quite deliberately what we want Ollama to do at build time, and then Ollama runs off and does whatever anyways, and users have to look at log output and performance regressions to know what it's actually doing, every time they update their heuristics for detecting ROCm. It's brittle as hell.
I disagree with this, but it's a reasonable argument. The problem is that the Ollama team has basically ignored the PR, instead of engaging the community. The least they can do is to explain their reasoning.
This PR is #1 on their repo based on multiple metrics (comments, iterations, what have you)