Comment by av_conk
6 months ago
I tried using ollama because I couldn't get ROCm working on my system with llama-cpp. Ollama bundles the ROCm libraries for you. I got around 50 tokens per second with that setup.
I tried llama-cpp with the Vulkan backend and doubled the amount of tokens per second. I was under the impression ROCm is superior to Vulkan, so I was confused about the result.
In any case, I've stuck with llama-cpp.
It depends on your GPU. Vulkan is well-supported by essentially all GPUs. AMD support ROCm well for their datacenter GPUs, but support for consumer hardware has not been as good.