← Back to context

Comment by Sanzig

10 hours ago

Take a look at Ollama Cloud: https://ollama.com/pricing

You get access to a whole bunch of bleeding edge open models including GLM-5.2, Kimi K2.7, DeepSeek 4 Pro, etc. Inference is run on US/SG/EU cloud providers with zero data retention policies. The $20/mo tier is very generous, in my experience.

They don’t have a statement about where it is run or data retention on the GLM5.2 model. They do state that for others, like MiniMax.

  • There's a blanket statement at the bottom of the pricing page, which I would hope also applies to GLM-5.2:

    > Where are models hosted?

    > Ollama hosts models and compute resources primarily in the United States. To serve global demand, we may route to Europe and Singapore for additional capacity.

    > Is my prompt or response data trained on?

    > Prompt or response data is never logged or trained on.

    > Who does Ollama partner with to host models?

    > Ollama collaborates with NVIDIA Cloud Providers (NCPs) to host open models.

    > When Ollama partners with providers, we require no logging, no training, and zero data retention policies in place.

Well I tried the $20/mo tier and used GLM specifically and did maybe 3-4 hours of work and I'm already through 50% of my monthly tier and blew through my time limited quota twice. I won't renew for another month.

Which I think only underscores my point that actually the GLM models are not very cost effective.

They essentially cost the same as the SOTA models from OpenAI and Anthropic, while not being quite as smart. I could have gotten about the same amount of work done on the $20 Codex plan. And I had to use my $100 Codex plan to finish the work GLM started before it ran out of quota. And also to fix it since GLM left a bit of a mess.

I like that GLM exists. Other Chinese models are far more cost effective. GLM is expensive, even on a fixed plan.