Comment by reddec

2 hours ago

I have roughly 20-40M token usage per day for GLM only (more if count other models). Using API pricing from OR it means ollama more profitable for me after day (few days if count cache properly).

For several models like Kimi and glm they have b300 and performance really good. At launch I got closer to 90-100 tps. Nowadays it’s around 60 tps stable across most models I used (utility models < 120B almost instant)