← Back to context

Comment by reddec

13 hours ago

My 50c - ollama cloud 20$. GLM5 and kimi are really competitive models, Ollama usage limits insane high, no limits where to use (has normal APIs), privacy and no logging

Interesting. I've always been turned off by how vague the descriptions of Ollama's limits are for their paid tiers. What sort of work have you been doing with it?

  • Background agents (diy OpenClaw like), coding, assistant (openwebui).

    The worst I saw - multiple parallel agents (opencode & pi-coding agents), with Kimi and glm, almost non stop development during the work day - 15-20% session consumption (I think it’s 2h bucket) max. Never hit the limit.

    In contrast, 20$ Claude in the similar mode I consumed after just few hours of work.

yeah? why do you like that over using GLM5 in a VPS that charges by token use? $20 still cheaper and seamless to set up? how are the tokens per second?

  • I have roughly 20-40M token usage per day for GLM only (more if count other models). Using API pricing from OR it means ollama more profitable for me after day (few days if count cache properly).

    For several models like Kimi and glm they have b300 and performance really good. At launch I got closer to 90-100 tps. Nowadays it’s around 60 tps stable across most models I used (utility models < 120B almost instant)