← Back to context

Comment by elaus

8 hours ago

What is the currently favored alternative for simply running 1-2 models locally, exposed via an API? One big advantage of Ollama seems to be that they provide fully configured models, so I don't have to fiddle with stop words, etc.

llama-swap if you need more than 1 model. It wraps llama.cpp, and has a Docker container version that is pretty easy to work with.

I just use llama.cpp server. It works really well. Some people recommend llama-swap or kobold but I never tried them.