it took me several hours to get llama.cpp working as a server, it took me 2 minutes to get ollama working.
much like how i got into linux via linux-on-vfat-on-msdos and wouldnt have gotten into linux otherwise, ollama got me into llama.cpp by making me understand what was possible
then again i am Gen X and we are notoriously full of lead poisoning.
> it took me several hours to get llama.cpp working as a server
Mm... Running a llama.cpp server is annoying; which model to use? Is it in the right format? What should I set `ngl` to? However, perhaps it would be fairer and more accurate to say that installing llama.cpp and installing ollama have slightly different effort levels (one taking about 3 minutes to clone and run `make` and the other taking about 20 seconds to download).
Once you have them installed, just typing: `ollama run llama3` is quite convenient, compared to finding the right arguments for the llama.cpp `server`.
Sensible defaults. Installs llama.cpp. Downloads the model for you. Runs the server for you. Nice.
> it took me 2 minutes to get ollama working
So, you know, I think its broadly speaking a fair sentiment; even if it probably isn't quite true.
...
However, when you look at it from that perspective, some things stand out:
- ollama is basically just a wrapper around llama.cpp
- ollama doesn't let you do all the things llama.cpp does
- ollama offers absolutely zero way, or even the hint of a suggestion of a way to move from using ollama to using llama.cpp if you need anything more.
Here's some interesting questions:
- Why can't I just run llama.cpp's server with the defaults from ollama?
- Why can't I get a simple dump of the 'sensible' defaults from ollama that it uses?
- Why can't I get a simple dump of the GGUF (or whatever) model file ollama uses?
- Why isn't 'a list of sensible defaults' just a github repository with download link and a list of params to use?
- Who's paying for the enormous cost of hosting all those ollama model files and converting them into usable formats?
The project is convenient, and if you need an easy way to get started, absolutely use it.
...but, I guess, I recommend you learn how to use llama.cpp itself at some point, because most free things are only free while someone else is paying for them.
Consider this:
If ollama's free hosted models were no longer free and you had to manually find and download your own model files, would you still use it? Could you still use it?
If not... maybe, don't base your business / anything important around it.
It's a SaaS with an open source client, and you're using the free plan.
It actually is true. Running an OpenAI compatible server using llama.cpp is a one-liner.
Check out the docker option if you don’t want to build/install llama.cpp.
https://github.com/ggerganov/llama.cpp/tree/master/examples/...
it took me several hours to get llama.cpp working as a server, it took me 2 minutes to get ollama working.
much like how i got into linux via linux-on-vfat-on-msdos and wouldnt have gotten into linux otherwise, ollama got me into llama.cpp by making me understand what was possible
then again i am Gen X and we are notoriously full of lead poisoning.
> it took me several hours to get llama.cpp working as a server
Mm... Running a llama.cpp server is annoying; which model to use? Is it in the right format? What should I set `ngl` to? However, perhaps it would be fairer and more accurate to say that installing llama.cpp and installing ollama have slightly different effort levels (one taking about 3 minutes to clone and run `make` and the other taking about 20 seconds to download).
Once you have them installed, just typing: `ollama run llama3` is quite convenient, compared to finding the right arguments for the llama.cpp `server`.
Sensible defaults. Installs llama.cpp. Downloads the model for you. Runs the server for you. Nice.
> it took me 2 minutes to get ollama working
So, you know, I think its broadly speaking a fair sentiment; even if it probably isn't quite true.
...
However, when you look at it from that perspective, some things stand out:
- ollama is basically just a wrapper around llama.cpp
- ollama doesn't let you do all the things llama.cpp does
- ollama offers absolutely zero way, or even the hint of a suggestion of a way to move from using ollama to using llama.cpp if you need anything more.
Here's some interesting questions:
- Why can't I just run llama.cpp's server with the defaults from ollama?
- Why can't I get a simple dump of the 'sensible' defaults from ollama that it uses?
- Why can't I get a simple dump of the GGUF (or whatever) model file ollama uses?
- Why isn't 'a list of sensible defaults' just a github repository with download link and a list of params to use?
- Who's paying for the enormous cost of hosting all those ollama model files and converting them into usable formats?
The project is convenient, and if you need an easy way to get started, absolutely use it.
...but, I guess, I recommend you learn how to use llama.cpp itself at some point, because most free things are only free while someone else is paying for them.
Consider this:
If ollama's free hosted models were no longer free and you had to manually find and download your own model files, would you still use it? Could you still use it?
If not... maybe, don't base your business / anything important around it.
It's a SaaS with an open source client, and you're using the free plan.
3 replies →