Comment by cientifico
2 months ago
For most users that wanted to run LLM locally, ollama solved the UX problem.
One command, and you are running the models even with the rocm drivers without knowing.
If llama provides such UX, they failed terrible at communicating that. Starting with the name. Llama.cpp: that's a cpp library! Ollama is the wrapper. That's the mental model. I don't want to build my own program! I just want to have fun :-P
Llama.cpp now has a gui installed by default. It previously lacked this. Times have changed.
Having read above article, I just gave llama.cpp a shot. It is as easy as the author says now, though definitely not documented quite as well. My quickstart:
brew install llama.cpp
llama-server -hf ggml-org/gemma-4-E4B-it-GGUF --port 8000
Go to localhost:8000 for the Web UI. On Linux it accelerates correctly on my AMD GPU, which Ollama failed to do, though of course everyone's mileage seems to vary on this.
Was hoping it was so easy :) But I probably need to look into it some more.
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' llama_model_load_from_file_impl: failed to load model
Edit: @below, I used `nix-shell -p llama-cpp` so not brew related. Could indeed be an older version indeed! I'll check.
11 replies →
While that might be true, for as long as its name is “.cpp”, people are going to think it’s a C++ library and avoid it.
This is the first I'm learning that it isn't just a C++ library.
In fact the first line of the wikipedia article is:
> llama.cpp is an open source software library
1 reply →
It would make sense to just make the GUI a separate project, they could call it llama.gui.
4 replies →
This is correct, and I avoided it for this reason, did not have the bandwidth to get into any cpp rabbit hole so just used whatever seemed to abstract it away.
Wait, it isn't? The name very strongly suggests that it is a text file containing C++ source code; is that not the case?
Frankly I think the cli UX and documentation is still much better for ollama.
It makes a bunch of decisions for you so you don't have to think much to get a model up and running.
I don't care about the GUI so much. Ollama lets me download, adjust and run a whole bunch of models and they are reasonably fast. Last time I compared it with Llama.cpp, finding out how to download and install models was a pain in Llama.cpp and it was also _much_ slower than Ollama.
That is not true.
If you today visit a models page on huggingface, the site will show you the exact oneliner you need to run to it on llama.cpp.
I didn't measure it, but both download and inference felt faster than ollama. One thing that was definitely better was memory usage, which may be important if you want to run small models on SCB.
Having picked it up recently and compared to both llama and lm studio - the models I was using ran faster, used less memory, and had a few extra confif options available that the others hadn't implemented yet but were suggested by the model authors.
It was easy to install, run, and access the gui to get going.
"LM Studio… Jan… Msty… koboldcpp…"
Plenty of alternatives listed. Can anyone with experience suggest the likely successor to Ollama? I have a Mac Mini but don't mind a C/L tool.
I think, as was pointed out, Ollama won because of how easy it is to set up, pull down new models. I would expect similar for a replacement.
If you don't want to have to think about it, LM Studio is probably the best choice.
How about kobold.cpp then? Or LMStudio (I know it's not open source, but at least they give proper credit to llama.cpp)?
Re curation: they should strive to not integrate broken support for models and avoid uploading broken GGUFs.
> For most users that wanted to run LLM locally, ollama solved the UX problem
This does not absolve them from the license violation
agree. We can easily compare it with docker. Of course people can use runc directly, but most people select not to and use `docker run` instead.
And you can blame docker in a similar manner. LXC existed for at least 5 years before docker. But docker was just much more convenient to use for an average user.
UX is a huge factor for adoption of technology. If a project fails at creating the right interface, there is nothing wrong with creating a wrapper.
>solved the UX problem.
>One command
Notwithstanding the fact that there's about zero difference between `ollama run model-name` and `llama-cpp -hf model-name`, and that running things in the terminal is already a gigantic UX blocker (Ollama's popularity comes from the fact that it has a GUI), why are you putting the blame back on an open source project that owes you approximately zero communication ?
> Notwithstanding the fact that there's about zero difference between `ollama run model-name` and `llama-cpp -hf model-name`
There is a TON of difference. Ollama downloads the model from its own model library server, sticks it somewhere in your home folder with a hashed name and a proprietary configuration that doesn't use the in built metadata specified by the model creator. So you can't share it with any other tool, you can't change parameters like temp on the fly, and you are stuck with whatever quants they offer.
This was my issue with current client ecosystem. I get a .guff file. I should be able to open my AI Client of choice and File -> Open and select a .guff. Same as opening a .txt file. Alternatively, I have cloned a HF model, all AI Clients should automatically check for the HF cache folder.
The current offering have interfaces to HuggingFace or some model repo. They get you the model based on what they think your hardware can handle and save it to %user%/App Data/Local/%app name%/... (on windows). When I evaluated running locally I ended up with 3 different folders containing copies of the same model in different directory structures.
It seems like HuggingFace uses %user%/.cache/.. however, some of the apps still get the HF models and save them to their own directories.
Those features are 'fine' for a casual user who sticks with one program. It seems designed from the start to lock you into their wrapper. In the end they are all using llama cpp, comfy ui, openvino etc to abstract away the backed. Again this is fine but hiding the files from the user seems strange to me. If you're leaning on HF then why now use their own .cache?
In the end I get the latest llama.cpp releases for CUDA and SYCL and run llama-server. My best UX has been with LM Studio and AI Playground. I want to try Local AI and vLLM next. I just want control over the damn files.
2 replies →
> Ollama's popularity comes from the fact that it has a GUI
It's not the GUI, it's the curated model hosting platform. Way easier to use than HF for casual users.
It also made easy for casual users to think that they were running deepseek.
LM Studio also offers curation, while giving credit to llama.cpp and also easy search across all of Huggingface's GGUF's
But if you’re just a GUI wrapper then at least attribute the library you created the GUI for
but if ollama is much slower, that's cutting on your fun and you'll be having better fun with a faster GUI
You’ve completely missed the point.
Whip that llama! Oh wait, that's a different program.
LOL
https://www.youtube.com/watch?v=HaF-nRS_CWM