Comment by zozbot234

6 months ago

This is not quite correct. Ollama must assess the state of Vulkan support and amount of available memory, then pick the fraction of the model to be hosted on GPU. This is not totally foolproof and will likely always need manual adjustment in some cases.

4 comments

zozbot234

buyucu 6 months ago

the work involved is tiny compared to the work llama.cpp did to get vulkan up and running.

this is not rocket science.

exe34 6 months ago
This sounds like it should be trivial to reproduce and extend - I look forward to trying out your repo!
- buyucu 6 months ago
  
  the owner of that PR has already forked ollama. try it out. I did and it works great.
  
  1 reply →