This is not quite correct. Ollama must assess the state of Vulkan support and amount of available memory, then pick the fraction of the model to be hosted on GPU. This is not totally foolproof and will likely always need manual adjustment in some cases.
If it's "just" a friendly front and, why doesn't llama.cpp just drop one themselves? Do they actually care about the situation, or are random people just mad on their behalf?
At the risk of being pedantic (I don't know much about C++ and I'm genuinely curious), if Ollama is really just a wrapper around Llama.cpp, why would it need the Vulkan specific flags?
Shouldn't it just call Llama.cpp and let Llama.cpp handle the flags internally within Llama.cpp? I'm thinking from an abstraction layer perspective.
This is not quite correct. Ollama must assess the state of Vulkan support and amount of available memory, then pick the fraction of the model to be hosted on GPU. This is not totally foolproof and will likely always need manual adjustment in some cases.
the work involved is tiny compared to the work llama.cpp did to get vulkan up and running.
this is not rocket science.
This sounds like it should be trivial to reproduce and extend - I look forward to trying out your repo!
2 replies →
Ok assuming what you said is correct, why wouldn't Ollama then be able to support Vulkan by default out of the box?
Sorry I'm not sure what's the relationship exactly between the two projects. This is a genuine questions, not a troll question.
check the PR, it's a very short one. It's not more complicated than setting a compile time flag.
I have no idea why they have been ignoring it.
Ollama is just a friendly front end for llama.cpp. It doesn't have to do any of those things you mentioned. Llama.cpp does all that.
If it's "just" a friendly front and, why doesn't llama.cpp just drop one themselves? Do they actually care about the situation, or are random people just mad on their behalf?
At the risk of being pedantic (I don't know much about C++ and I'm genuinely curious), if Ollama is really just a wrapper around Llama.cpp, why would it need the Vulkan specific flags?
Shouldn't it just call Llama.cpp and let Llama.cpp handle the flags internally within Llama.cpp? I'm thinking from an abstraction layer perspective.
2 replies →