Comment by buyucu

6 months ago

with vulkan it runs much much faster on consumer hardware, especially opn igpus like intel or amd.

8 comments

buyucu

Well, it definitely runs faster on external dGPU's. With iGPU's and possibly future NPU's, the pre-processing/"thinking" phase is much faster (because that one is compute-bound) but text generation tends to be faster on CPU because it makes better use of available memory bandwidth (which is the relevant constraint there). iGPU's and NPU's will still be a win wrt. energy use, however.

bdhcuidbebe 6 months ago

For Intel, OpenVINO should be the preferred route. I dont follow AMD, but Vulkan is just the common denominator here.

buyucu 6 months ago
If you support Vulkan, you support almost every GPU out there in the consumer market across all hardware vendors. It's an amazing fallback option.
I agree they should also support OpenVINO, but compared to Vulkan OpenVINO is a tiny market.
- bdhcuidbebe 6 months ago
  
  I made an argument for performance, not for compatibility.
  If you run your local llm in the least performant way possible on tour overly expensive GPU, then you are not making value of your purchase.
  Vulkan is a fallback option is all.
  I even see people running on their CPU because some apps dont support their hardware and llama.cpp made it even possible. It is still a really bad idea.
  Its just goes to show there’s still much to do.
  
  3 replies →

sebazzz 6 months ago

How is the performance of Vulkan vs ROCm on AMD iGPUs? Ollama can be persuaded to run on iGPUs with ROCm.