Comment by DiabloD3

2 days ago

I'm not going to open an issue on this, but you should consider expanding on the self-hosting part of the handbook and explicitly recommend llama.cpp for local self-hosted inference.

The self hosting section covers corporate use case using vLlm and sglang as well as personal desktop use using Ollama which is a wrapper over llama.cpp.

  • Recommending Ollama isn't useful for end users, its just a trap in a nice looking wrapper.

    • Strong disagree on this. Ollama is great for moderately technical users who aren't really programmers or proficient with the command line.

      4 replies →