← Back to context

Comment by buyucu

9 hours ago

ollama is a lost cause. they are going through a very aggressive phase of enshittification right now.

I disagree. Ollama’s reason to be is to make things simple, not to always be on the cutting edge. I use Ollama when I can because of this simplicity. Since I bought a 32G integrated memory Mac 18 months ago, I have run so many models using Ollama, with close to zero problems.

The simple thing to do is to just use the custom quantization that OpenAI used for gpt-oss and use GGUF for other models.

Using Huggingface, LM Studio, etc. is the Linux metaphor of flexibility. Using Ollama is sort of like using macOS

What is the currently favored alternative for simply running 1-2 models locally, exposed via an API? One big advantage of Ollama seems to be that they provide fully configured models, so I don't have to fiddle with stop words, etc.

  • llama-swap if you need more than 1 model. It wraps llama.cpp, and has a Docker container version that is pretty easy to work with.

  • I just use llama.cpp server. It works really well. Some people recommend llama-swap or kobold but I never tried them.