Comment by hnlmorg

10 days ago

Thanks for the insights. I'm not familiar with .gguf. What's the advantage of that format?

2 comments

hnlmorg

.gguf is the native format of llama.cpp and is widely used for quantized models (models with reduced float accuracy to reduce memory requirements).

llama.cpp is the actual engine running the llms, ollama is a wrapper around it.

embedding-shape 10 days ago

> llama.cpp is the actual engine running the llms, ollama is a wrapper around it.
How far did they get with their own inference engine? I seem to recall for the launch of Gemma (or some other model), they also launched their own Golang backend (I think), but never heard anything more about it. I'm guessing they'll always use llama.cpp for anything before that, but did they continue iterating on their own backend and how is it today?