Comment by androiddrew

25 days ago

Now all we need is better support for AMD gpus, both CDNA and RDNA types

6 comments

androiddrew

ZLUDA implements CUDA on top of AMD ROCm - they are explicitly targetting vLLM as their PyTorch compatibility test: https://vosen.github.io/ZLUDA/blog/zluda-update-q4-2025/#pyt...

(PyTorch does also support ROCm generally, it shows up as a CUDA device.)

ikari_pl 25 days ago
I feel like these technologies are named by the Polish at the companies. "CUDA" means "WONDERS" and "ZŁUDA" would be an "ILLUSION".
- Gracana 25 days ago
  
  ZLUDA was definitely intentional: https://github.com/vosen/ZLUDA/discussions/192

sofixa 25 days ago

You can run vLLM with AMD GPUs supported by ROCm: https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/infer...

However from experience with an AMD Strix Halo, a couple of caveats: it's drastically slower than Ollama (tested over a few weeks, always using the official AMD vLLM nightly releases), and not all GPUs were supported for all models (but that has been fixed).

bildung 25 days ago
vLLM ususally only plays out its strength when serving multiple users in parallel, in contrast to llama.cpp (Ollama is a wrapper around llama.cpp).
If you want more performance, you could try running llama.cpp directly or use the prebuilt lemonade nightlies.
- sofixa 25 days ago
  
  But vLLM was half the t/s of Ollama, so something was obviously not ok.