Comment by mehdibl
13 hours ago
Ollama is quite a bad example here. Despite popular, it's a simple wrapper and more and more pushed by the app it wraps llama.cpp.
Don't understand here the parallel.
13 hours ago
Ollama is quite a bad example here. Despite popular, it's a simple wrapper and more and more pushed by the app it wraps llama.cpp.
Don't understand here the parallel.
TBVH I didn't think about naming it too much. I defaulted to Ollama because of the perceive simplicity and I wanted that same perceived simplicity to help adoption.
This is the vLLM of classic ML, not Ollama.
I guess the parallel is "Ollama serve" which provides you with a direct REST API to interact with a LLM.
llama-cpp provides an API server as well via llama-server (and a competent webgui too).