Comment by buyucu

6 months ago

llama.cpp has supported vulkan for more than a year now. For more than 6 months now there has been an open PR to add vulkan backend support for Ollama. However, Ollama team has not even looked at it or commented on it.

Vulkan backends are existential for running LLMs on consumer hardware (iGPUs especially). It's sad to see Ollama miss this opportunity.

23 comments

buyucu

Kubuxu 6 months ago

Don’t be sad for commercial entity that is not a good player https://github.com/ggerganov/llama.cpp/pull/11016#issuecomme...

andy_ppp 6 months ago
This is great, I did not know about RamaLama and I'll be using and recommending that in future and if I see people using Ollama in instructions I'll recommend they move to RamaLama in the future. Cheers.
- api 6 months ago
  
  This is fascinating. I’ve been using ollama with no knowledge of this because it just works without a ton of knobs I don’t feel like spending the time to mess with.
  As usual, the real work seems to be appropriated by people who do the last little bit — put an acceptable user experience and some polish on it — and they take all the money and credit.
  It’s shitty but it also happens because the vast majority of devs, especially in the FOSS world, do not understand or appreciate user experience. It is bar none the most important thing in the success of most things in computing.
  My rule is: every step a user has to do to install or set up something halves adoption. So if 100 people enter and there are two steps, 25 complete the process.
  For a long time Apple was the most valuable corporation on Earth on the basis of user experience alone. Apple doesn’t invent much. They polish it, and that’s where like 99% of the value is as far as the market is concerned.
  The reason is that computers are very confusing and hard to use. Computer people, which most of us are, don’t see that because it’s second nature to us. But even for computer people you get to the point where you’re busy and don’t have time to nerd out on every single thing you use, so it even matters to computer people in the end.
  
  2 replies →
- jdright 6 months ago
  
  Yeah, I would love an actual alternative to Ollama, but RamaLama is not it unfortunately. As the other commenter said, onboarding is important. I just want one operation install and it needs to work and the simple fact RamaLama is written in Python, assures it will never be that easy, and this is even more true with LLM stuff when using AMD gpu.
  I know there will be people that disagree with this, that's ok. This is my personal experience with Python in general, and 10x worse when I need to figure out all compatible packages with specifc ROCm support for my GPU. This is madness, even C and C++ setup and build is easier than this Python hell.
  
  9 replies →
bearjaws 6 months ago

It's hilarious that docker guys are trying to take another OSS and monetize it. Hey if it worked once?...
buyucu 6 months ago

I was not aware of this context, thanks!

n144q 6 months ago

Thanks, just yesterday I discovered that Ollama could not use iGPU on my AMD machine, and was going through a long issue for solutions/workarounds (https://github.com/ollama/ollama/issues/2637). Existing instructions are based on Linux, and some people found it utterly surprising that anyone wants to run LLMs on Windows (really?). While I would have no trouble installing Linux and compile from source, I wasn't ready to do that to my main, daily-use computer.

Great to see this.

PS. Have you got feedback on whether this works on Windows? If not, I can try to create a build today.

zozbot234 6 months ago

The PR has been legitimately out-of-date and unmergeable for many months. It was forward-ported a few weeks ago, and is now still awaiting formal review and merging. (To be sure, Vulkan support in Ollama will likely stay experimental for some time even if the existing PR is merged, and many setups will need manual adjustment of the number of GPU layers and such. It's far from 100% foolproof even in the best-case scenario!)

For that matter, some people are still having issues building and running it, as seen from the latest comments on the linked GitHub page. It's not clear that it's even in a fully reviewable state just yet.

buyucu 6 months ago

this pr was reviewable multiple times, rebased multiple times. all because ollama team kept ignoring it. it has been open for almost 7 months now without a single comment from the ollama folks.
ecurtin 6 months ago

It's gets out of date with conflicts, etc. Because it's ignored, if this was the upstream project of Ollama, llama.cpp the maintainers would have got this merged months ago.

9cb14c1ec0 6 months ago

The PR at issue here blocks iGPUs. My fork of the PR changes removes that:

https://github.com/9cb14c1ec0/ollama-vulkan

I successfully ran Phi4 on my AMD Ryzen 7 PRO 5850U iGPU with it.

buyucu 6 months ago

this is great! I think pufferfish is taking PRs to his fork as well.