← Back to context

Comment by trash_cat

6 months ago

I use Ollama because I am a casual user and can't be bothered to read the docs on how to setup llama.cpp. I just want to run a simple llm locally.

Why would I care about Vulkan?

with vulkan it runs much much faster on consumer hardware, especially opn igpus like intel or amd.

  • Well, it definitely runs faster on external dGPU's. With iGPU's and possibly future NPU's, the pre-processing/"thinking" phase is much faster (because that one is compute-bound) but text generation tends to be faster on CPU because it makes better use of available memory bandwidth (which is the relevant constraint there). iGPU's and NPU's will still be a win wrt. energy use, however.

  • For Intel, OpenVINO should be the preferred route. I dont follow AMD, but Vulkan is just the common denominator here.

    • If you support Vulkan, you support almost every GPU out there in the consumer market across all hardware vendors. It's an amazing fallback option.

      I agree they should also support OpenVINO, but compared to Vulkan OpenVINO is a tiny market.

      4 replies →

  • How is the performance of Vulkan vs ROCm on AMD iGPUs? Ollama can be persuaded to run on iGPUs with ROCm.