Comment by zozbot234
6 months ago
The Vulkan-specific flags are needed (1) to set up the llama.cpp build options when building Ollama w/ Vulkan support - which apparently is still a challenge with the current PR, if the latest comments on the GitHub page are accurate; also (2) to pick how many model layers should be run on the GPU, depending on available GPU memory. Llama.cpp doesn't do that for you, you have to set that option yourself or just tell it to move "everything", which often fails with an error. (Finding the right amount is actually a trial-and-error process which depends on the model, quantization and also varies depending on how much context you have in the current conversation. If you have too many layers loaded and too little GPU memory, a large context can result in unpredictable breakage.)
Thanks a lot for the explanation.
If I can ask one more question, why don't Ollama use binaries of pre-built llama.cpp with Vulkan support directly?