Comment by yieldcrv
10 hours ago
yeah? why do you like that over using GLM5 in a VPS that charges by token use? $20 still cheaper and seamless to set up? how are the tokens per second?
10 hours ago
yeah? why do you like that over using GLM5 in a VPS that charges by token use? $20 still cheaper and seamless to set up? how are the tokens per second?
I have roughly 20-40M token usage per day for GLM only (more if count other models). Using API pricing from OR it means ollama more profitable for me after day (few days if count cache properly).
For several models like Kimi and glm they have b300 and performance really good. At launch I got closer to 90-100 tps. Nowadays it’s around 60 tps stable across most models I used (utility models < 120B almost instant)