Comment by bildung

1 month ago

vLLM ususally only plays out its strength when serving multiple users in parallel, in contrast to llama.cpp (Ollama is a wrapper around llama.cpp).

If you want more performance, you could try running llama.cpp directly or use the prebuilt lemonade nightlies.