Comment by trash_cat
6 months ago
I use Ollama because I am a casual user and can't be bothered to read the docs on how to setup llama.cpp. I just want to run a simple llm locally.
Why would I care about Vulkan?
6 months ago
I use Ollama because I am a casual user and can't be bothered to read the docs on how to setup llama.cpp. I just want to run a simple llm locally.
Why would I care about Vulkan?
with vulkan it runs much much faster on consumer hardware, especially opn igpus like intel or amd.
Well, it definitely runs faster on external dGPU's. With iGPU's and possibly future NPU's, the pre-processing/"thinking" phase is much faster (because that one is compute-bound) but text generation tends to be faster on CPU because it makes better use of available memory bandwidth (which is the relevant constraint there). iGPU's and NPU's will still be a win wrt. energy use, however.
For Intel, OpenVINO should be the preferred route. I dont follow AMD, but Vulkan is just the common denominator here.
If you support Vulkan, you support almost every GPU out there in the consumer market across all hardware vendors. It's an amazing fallback option.
I agree they should also support OpenVINO, but compared to Vulkan OpenVINO is a tiny market.
4 replies →
How is the performance of Vulkan vs ROCm on AMD iGPUs? Ollama can be persuaded to run on iGPUs with ROCm.