Comment by buyucu

9 hours ago

ollama is a lost cause. they are going through a very aggressive phase of enshittification right now.

4 comments

buyucu

I disagree. Ollama’s reason to be is to make things simple, not to always be on the cutting edge. I use Ollama when I can because of this simplicity. Since I bought a 32G integrated memory Mac 18 months ago, I have run so many models using Ollama, with close to zero problems.

The simple thing to do is to just use the custom quantization that OpenAI used for gpt-oss and use GGUF for other models.

Using Huggingface, LM Studio, etc. is the Linux metaphor of flexibility. Using Ollama is sort of like using macOS

elaus 8 hours ago

What is the currently favored alternative for simply running 1-2 models locally, exposed via an API? One big advantage of Ollama seems to be that they provide fully configured models, so I don't have to fiddle with stop words, etc.

arcanemachiner 6 hours ago

llama-swap if you need more than 1 model. It wraps llama.cpp, and has a Docker container version that is pretty easy to work with.
buyucu 7 hours ago

I just use llama.cpp server. It works really well. Some people recommend llama-swap or kobold but I never tried them.