Comment by mehdibl

13 hours ago

Ollama is quite a bad example here. Despite popular, it's a simple wrapper and more and more pushed by the app it wraps llama.cpp.

Don't understand here the parallel.

4 comments

mehdibl

kossisoroyce 7 hours ago

TBVH I didn't think about naming it too much. I defaulted to Ollama because of the perceive simplicity and I wanted that same perceived simplicity to help adoption.

eleventyseven 8 hours ago

This is the vLLM of classic ML, not Ollama.

ekianjo 10 hours ago

I guess the parallel is "Ollama serve" which provides you with a direct REST API to interact with a LLM.

sieve 8 hours ago

llama-cpp provides an API server as well via llama-server (and a competent webgui too).