← Back to context

Comment by av_conk

6 months ago

I tried using ollama because I couldn't get ROCm working on my system with llama-cpp. Ollama bundles the ROCm libraries for you. I got around 50 tokens per second with that setup.

I tried llama-cpp with the Vulkan backend and doubled the amount of tokens per second. I was under the impression ROCm is superior to Vulkan, so I was confused about the result.

In any case, I've stuck with llama-cpp.

It depends on your GPU. Vulkan is well-supported by essentially all GPUs. AMD support ROCm well for their datacenter GPUs, but support for consumer hardware has not been as good.