> llama.cpp is the actual engine running the llms, ollama is a wrapper around it.
How far did they get with their own inference engine? I seem to recall for the launch of Gemma (or some other model), they also launched their own Golang backend (I think), but never heard anything more about it. I'm guessing they'll always use llama.cpp for anything before that, but did they continue iterating on their own backend and how is it today?
.gguf is the native format of llama.cpp and is widely used for quantized models (models with reduced float accuracy to reduce memory requirements).
llama.cpp is the actual engine running the llms, ollama is a wrapper around it.
> llama.cpp is the actual engine running the llms, ollama is a wrapper around it.
How far did they get with their own inference engine? I seem to recall for the launch of Gemma (or some other model), they also launched their own Golang backend (I think), but never heard anything more about it. I'm guessing they'll always use llama.cpp for anything before that, but did they continue iterating on their own backend and how is it today?