← Back to context

Comment by lolinder

6 months ago

As has been pointed out in this thread in a comment that you replied to (so I know you saw it) [0], Ollama goes to a lot of contortions to support multiple llama.cpp backends. Yes, their solution is a bit of a hack, but it means that the effort to adding a new back end is substantial.

And again, they're doing those contortions to make it easy for people. Making it easy involves trade-offs.

Yes, Ollama has flaws. They could communicate better about why they're ignoring PRs. All I'm saying is let's not pretend they're not doing anything complicated or difficult when no one has been able to recreate what they're doing.

[0] https://news.ycombinator.com/item?id=42886933

This is incorrect. The effort it took to enable Vulkan was relatively minor. The PR is short and to be honest it doesn't do much, because it doesn't need to.

  • that PR doesn't actually work though -- it finds the Vulkan libraries and has some memory accounting logic, but the bits to actually build a Vulkan llama.cpp runner are not there. I'm not sure why its author deems it ready for inclusion.

    (I mean, the missing work should not be much, but it still has to be done)

    • the pr was working 6 months ago and it has been rebased multiple times as the ollama team kept ignoring it and mainline moved. I'm using it right now.

  • This is a change from your response to the comment that I linked to, where you said it was a good point. Why the difference?

    Maybe I should clarify that I'm not saying that the effort to enable a new backend is substantial, I'm saying that my understanding of that comment (the one you acknowledged made a good argument) is that the maintenance burden of having a new backend is substantial.