Comment by steren

7 days ago

> I would never want to use something like ollama in a production setting.

We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.

4 comments

steren

ekianjo 7 days ago

you need to benchmark against llama.cpp as well.

apitman 7 days ago

Did you test multi-user cases?

jasonjmcghee 6 days ago

Assuming this is equivalent to parallel sessions, I would hope so, this is like the entire point of vLLM

sbinnee 6 days ago

vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine.