← Back to context

Comment by anhner

8 hours ago

It offers a GUI for easier configuration and management of models, and it allows you to store/load models as .gguf something ollama doesn't do (it stores the models across multiple files - and yes, I know you can load a .gguf in ollama but it still makes a copy in its weird format so now I need to either have a duplicate on my drive or delete my original .gguf)

Thanks for the insights. I'm not familiar with .gguf. What's the advantage of that format?

  • .gguf is the native format of llama.cpp and is widely used for quantized models (models with reduced float accuracy to reduce memory requirements).

    llama.cpp is the actual engine running the llms, ollama is a wrapper around it.

    • > llama.cpp is the actual engine running the llms, ollama is a wrapper around it.

      How far did they get with their own inference engine? I seem to recall for the launch of Gemma (or some other model), they also launched their own Golang backend (I think), but never heard anything more about it. I'm guessing they'll always use llama.cpp for anything before that, but did they continue iterating on their own backend and how is it today?