Comment by spelk

11 hours ago

Please correct me if you have contradicting data but: Neuralwatt's price per token vs price for energy comparison doesn't seem to take into account the cost savings from cache hits that other providers offer on pure token rates. The comparison seems to assume every input token is a cache miss.

On top of that, the cloud offering doesn't seem that well-run, they randomly blocked a colleague's API key for a couple days without any heads up, had a weird rate limiting bug and they have been deprecating models without redirects with very short notice, all while taking weeks to onboard new models. I assume some of these problems would be addressed if we had an SLA/enterprise contract.

It's a promising idea though. They offer a $5 trial credit (with an aggressive rate limit) though so no harm in trying it out.

> doesn't seem to take into account the cost savings from cache hits

Absolute false information.

From my usage panel for this month:

* Total Tokens 1.1B * Cached Tokens 1.0B 97% of prompt tokens * Cost energy pricing $26.58

The energy pricing is higher then what i actually pay because its a mix of token billing and partial subscription (60% extra "power").

From the $50 subscription, i have about 3/4 left (4.21 of 16.0 kWh used this billing cycle). Used $5.5 in token billing.

That was running 82.0% GLM 5.1, and 18% GLM 5.2. Yes, i have been busy ;)

My actual usage if we look in dollar value was ~ $18.

For your information, that is cheaper the MiMo v2.5 Pro from Xiaomi as there i was doing around 450.000t per cent. And they have the same 75% cheaper prices like DeepSeek. MiMo has a issue with cache retention between session prompts what hurts them vs DeepSeek. Yes, DeepSeek v4 Pro is 2.5x cheaper but nowhere near GLM 5.1, and especially not GLM 5.2.

In case your wondering, zai subscription light is about 80m token / week limit. So on a token/cent price, neutralwatt is about 3x cheaper (and not 5h, week limits to maximize/frustrate).

> all while taking weeks to onboard new models.

Took them 1 day to include GLM 5.2 ... Yes, the remove old models fast because they do not have the server capacity to keep old models around.

> I assume some of these problems would be addressed if we had an SLA/enterprise contract.

Its a small team, not a big huge company. From my experience so far, seen a 2 timeouts, and sometimes slow speeds as servers get overloaded. For what i am paying for GLM ~5.1~ 5.2 ...

  • Your reply doesn't seem to be in good faith. Please provide your formula for calculating effective per token cost.

    I am not sure why the small team argument is relevant. This is a crowded market, there are dozens if hundreds of third party inference providers in the world right now. I'm glad that's a good excuse that works on you but I'm not sure why the average user should care.