Comment by jychang

2 days ago

    llamacpp> ls -l \*llama\*
    -rwxr-xr-x 1 root root 2505480 Aug  7 05:06 libllama.so
    -rwxr-xr-x 1 root root 5092024 Aug  7 05:23 llama-server

That's a terrible excuse, Llama.cpp is just 7.5 megabytes. You can easily ship a couple copies of that. The current ollama for windows download is 700MB.

I don't buy it. They're not willing to make an 700MB download a few megabytes bigger to ~730MB, but they are willing to support a fork/rewrite indefinitely (and the fork is outside of their core competency, as seen by the current issues)? What kind of decisionmaking is that?

1 comment

jychang

vlovich123 2 days ago

It’s 700mib because they’re likely redistributing the CUDA libraries so that users don’t need to separately run that installer. Llama.cpp is a bit more “you are expected to know what you’re doing” on that front. But yeah, you could plausibly ship multiple versions of the inference engine although from a maintenance perspective that sounds like hell for any number of reasons