Comment by benjiro29
8 hours ago
> doesn't seem to take into account the cost savings from cache hits
Absolute false information.
From my usage panel for this month:
* Total Tokens 1.1B * Cached Tokens 1.0B 97% of prompt tokens * Cost energy pricing $26.58
The energy pricing is higher then what i actually pay because its a mix of token billing and partial subscription (60% extra "power").
From the $50 subscription, i have about 3/4 left (4.21 of 16.0 kWh used this billing cycle). Used $5.5 in token billing.
That was running 82.0% GLM 5.1, and 18% GLM 5.2. Yes, i have been busy ;)
My actual usage if we look in dollar value was ~ $18.
For your information, that is cheaper the MiMo v2.5 Pro from Xiaomi as there i was doing around 450.000t per cent. And they have the same 75% cheaper prices like DeepSeek. MiMo has a issue with cache retention between session prompts what hurts them vs DeepSeek. Yes, DeepSeek v4 Pro is 2.5x cheaper but nowhere near GLM 5.1, and especially not GLM 5.2.
In case your wondering, zai subscription light is about 80m token / week limit. So on a token/cent price, neutralwatt is about 3x cheaper (and not 5h, week limits to maximize/frustrate).
> all while taking weeks to onboard new models.
Took them 1 day to include GLM 5.2 ... Yes, the remove old models fast because they do not have the server capacity to keep old models around.
> I assume some of these problems would be addressed if we had an SLA/enterprise contract.
Its a small team, not a big huge company. From my experience so far, seen a 2 timeouts, and sometimes slow speeds as servers get overloaded. For what i am paying for GLM ~5.1~ 5.2 ...
Your reply doesn't seem to be in good faith. Please provide your formula for calculating effective per token cost.
I am not sure why the small team argument is relevant. This is a crowded market, there are dozens if hundreds of third party inference providers in the world right now. I'm glad that's a good excuse that works on you but I'm not sure why the average user should care.