Comment by throwa356262

2 months ago

That is not true.

If you today visit a models page on huggingface, the site will show you the exact oneliner you need to run to it on llama.cpp.

I didn't measure it, but both download and inference felt faster than ollama. One thing that was definitely better was memory usage, which may be important if you want to run small models on SCB.