Comment by paradite

6 months ago

Ok assuming what you said is correct, why wouldn't Ollama then be able to support Vulkan by default out of the box?

Sorry I'm not sure what's the relationship exactly between the two projects. This is a genuine questions, not a troll question.

5 comments

paradite

check the PR, it's a very short one. It's not more complicated than setting a compile time flag.

I have no idea why they have been ignoring it.

Ollama is just a friendly front end for llama.cpp. It doesn't have to do any of those things you mentioned. Llama.cpp does all that.

fkyoureadthedoc 6 months ago

If it's "just" a friendly front and, why doesn't llama.cpp just drop one themselves? Do they actually care about the situation, or are random people just mad on their behalf?
paradite 6 months ago
At the risk of being pedantic (I don't know much about C++ and I'm genuinely curious), if Ollama is really just a wrapper around Llama.cpp, why would it need the Vulkan specific flags?
Shouldn't it just call Llama.cpp and let Llama.cpp handle the flags internally within Llama.cpp? I'm thinking from an abstraction layer perspective.
- zozbot234 6 months ago
  
  The Vulkan-specific flags are needed (1) to set up the llama.cpp build options when building Ollama w/ Vulkan support - which apparently is still a challenge with the current PR, if the latest comments on the GitHub page are accurate; also (2) to pick how many model layers should be run on the GPU, depending on available GPU memory. Llama.cpp doesn't do that for you, you have to set that option yourself or just tell it to move "everything", which often fails with an error. (Finding the right amount is actually a trial-and-error process which depends on the model, quantization and also varies depending on how much context you have in the current conversation. If you have too many layers loaded and too little GPU memory, a large context can result in unpredictable breakage.)
  
  1 reply →