Comment by unrvl22

13 hours ago

Why aren't more people talking about this? It's literally Opus 4.7 quality stupid prices. I know providers who are offering this at unlimited tokens for $50 a month. Some are even offering API rates at 3x lower than the official ZAI api rates which are already like 10x cheaper than Opus. (Crof and Umans btw)

This is a huge blow to Anthropic/OpenAI/Google and a massive win for the rest of the world. The official API prices and speeds mean nothing for open source models.

69 comments

unrvl22

CuriouslyC 13 hours ago

Be careful about unofficial providers, a lot of them misconfigure models or stealth quantize them. For a while the difference between Kimi on the official API and most third party providers was 20-40%.

thehamkercat 11 hours ago

Kimi K2 had a vendor verifier: https://github.com/MoonshotAI/K2-Vendor-Verifier
(there's a table which shows comparison between vendors)
Also, it seems there's a general one as well (for all kimi models?): https://github.com/MoonshotAI/Kimi-Vendor-Verifier
cedws 13 hours ago
OpenRouter should be penalising or banning for this.
- kilroy123 12 hours ago
  
  This is my biggest complaint about OpenRouter and I'm a fan. Might be pretty tough at scale?
- orbital-decay 10 hours ago
  
  They have an "exacto" category with providers they supposedly verified
  
  1 reply →
- alecco 11 hours ago
  
  Would that align with their VC-backed incentives?
  
  1 reply →
unrvl22 13 hours ago

the 2 I mentioned both have a fairly large following, who run benchmarks and absolutely will spot issues.

stanac 13 hours ago

> Some are even offering API rates at 3x lower than the official ZAI api rates

Looking at openrouter [1], some of the cheaper offerings are for quantized models. Not sure how much intelligence is lost in quantization. And they are not 3 times cheaper. Where did you find 3x lower prices for APIs? I am considering skipping open router and using them directly for that price.

edit:

I see, croft [2] 8bit for $0.50/$0.08/$2.20

[1]: https://openrouter.ai/z-ai/glm-5.2

[2]: https://ai.nahcrof.com/pricing

scrlk 12 hours ago
IME, unquantised -> FP8 is pretty much lossless. What matters more is having an unquantized KV cache - using an FP8 KV cache can result in a significant drop in quality.
- osti 17 minutes ago
  
  The official API is FP8, which should imply that it's lossless.
- johnnyApplePRNG 8 hours ago
  
  >unquantised -> FP8 is pretty much lossless
  Claude Shannon is rolling in his grave.
  
  1 reply →
- ComputerGuru 9 hours ago
  
  Do infra providers reveal that level of implementation detail?
  
  1 reply →
benjiro29 12 hours ago
Neuralwatt ... When you reverse calculate the actual energy usage / price on a token basis, the gap is large.
I do not have GLM 5.2 numbers because the whole default max setting is overkill. But GLM 5.1 numbers had it at 12x cheaper then API rates. And about 2.5x more tokens vs zai their own subscription service.
Yes, its FP8 but lets be honest, do we know for sure that even zai runs at FP16? I learned a long time ago with Claude and Codex how much cheating happens on model levels, even from the big boys.
- spelk 8 hours ago
  
  Please correct me if you have contradicting data but: Neuralwatt's price per token vs price for energy comparison doesn't seem to take into account the cost savings from cache hits that other providers offer on pure token rates. The comparison seems to assume every input token is a cache miss.
  On top of that, the cloud offering doesn't seem that well-run, they randomly blocked a colleague's API key for a couple days without any heads up, had a weird rate limiting bug and they have been deprecating models without redirects with very short notice, all while taking weeks to onboard new models. I assume some of these problems would be addressed if we had an SLA/enterprise contract.
  It's a promising idea though. They offer a $5 trial credit (with an aggressive rate limit) though so no harm in trying it out.

Schiendelman 13 hours ago

To answer the question in your first sentence - because it's VERY computationally (ha) expensive as a human being to keep up with all the options. It's also very hard to figure out how to run a model like this. There's no installer. If you really really care, which 99% of people do not, you have to google a guide, and then find out it's out of date...

I've tried a number of these, and the learning curve is very steep compared to "install Claude Code and pay $100/mo". There is no way saving me $50/month matters compared to figuring that out.

andai 13 hours ago
But it just works with Claude Code? They have a guide on their website.
https://docs.z.ai/devpack/tool/claude
Here's my setup. I add this to my .bashrc
export ZAI_API_KEY="your_key_here"
alias claudez='ANTHROPIC_AUTH_TOKEN="$ZAI_API_KEY" ANTHROPIC_BASE_URL="https://api.z.ai/api/anthropic" ANTHROPIC_DEFAULT_OPUS_MODEL="glm-5.2[1m]" ANTHROPIC_DEFAULT_SONNET_MODEL="glm-4.7" ANTHROPIC_DEFAULT_HAIKU_MODEL="glm-4.7" claude'
Then I just run claudez
pro tip the same thing works with deepseek https://api-docs.deepseek.com/guides/anthropic_api
Even more pro tip: Claude Code can set this up for you haha
- Schiendelman 13 hours ago
  
  Sure, I'm not saying I, a software engineer, cannot do this. I'm saying it's significant onboarding friction.
  Unless this were a massive differentiator, people aren't going to be "talking about it" the way GP suggests!
  
  16 replies →
- chen66996 12 hours ago
  
  [flagged]
re-thc 10 hours ago
> There's no installer.
There's ZCode (https://zcode.z.ai). Which is like the Codex App.
That's as "easy" as it is for non-devs that you're complaining about.
- qingcharles 8 hours ago
  
  How does it compare to OpenCode? I already have too many LLM CLIs installed :(
- Schiendelman 9 hours ago
  
  I'm not complaining about anything. I'm answering a question.
CamperBob2 4 hours ago
It's also very hard to figure out how to run a model like this. There's no installer.
Yes, there is. It's called Claude Code. Point it at the HuggingFace URL and say "Download these weights and build whatever is needed to run them, then test the model."
- PoignardAzur 4 hours ago
  
  I really miss the time when people thought that the idea of someone telling an un-sandboxed AI "do whatever is needed to X" was unrealistically stupid.
  
  1 reply →
chillfox 10 hours ago

install opencode, then either pay $10 for their plan, or add an openrouter api key.
gerryf2 10 hours ago

I agree with this.
I'd pay for an out of the box solution. i.e. an Installer with updates

cedws 13 hours ago

In my org everyone is extremely Claude-pilled to the point you’d think it’s the only LLM that exists, purely because it caters to non-engineers within enterprises.

unrvl22 13 hours ago

I cancelled my claude sub after realizing I can burn 300m tokens a day of this quality, for $50 a month.

spelk 7 hours ago

Which coding plan are you using? How are you finding it?

embedding-shape 13 hours ago

> Why aren't more people talking about this?

Wasn't this released like 2 days ago? Everyone is still evaluating and playing around with it, things like the submission is just starting to come out. Give it some days at least before jumping to conclusions, ideally weeks.

knollimar 11 hours ago

Isn't it closer to sonnet?

RussianCow 3 hours ago
The Chinese open weight models have been ahead of Sonnet (at least for coding) for a couple months now. I tend to take benchmarks with a huge grain of salt, but in my own experience, the latest versions of Kimi, MiMo, and GLM (pre-5.2) had already surpassed Sonnet in terms of output quality for a fraction of the price.
With that said, I'm excited to try GLM 5.2 because I still end up reaching for Opus and GPT 5.5 for many tasks because the open models tend to get stuck more often on complex problems.
- knollimar 1 hour ago
  
  I found sonnet preferable to k2.6 but 2.7 code for kimi seems better anecdotally
redox99 11 hours ago
Definitely opus level for coding.
- smith7018 11 hours ago
  
  Do you have benchmarks or at least anecdotes to back that up? I'm not arguing with you; I would just love to see some proof that open models are getting as good as Anthropic's models.
  
  3 replies →
- knollimar 7 hours ago
  
  Oic I misremembered OAI scores, I thought Sonnet had 51

Hamuko 13 hours ago

I’m not that interested in models that I can’t run on my desktop for ~0€, which is my AI budget.

andai 13 hours ago
Electricity cost seems to be about $30/month for a 32B model on a GPU. It's probably better on Apple hardware.
https://github.com/QuantiusBenignus/Zshelf/discussions/2
Not accounting for hardware, of course :)
- Hamuko 12 hours ago
  
  My Mac Studio uses about 60–80 watts whenever I’m running a model (as measured by the system metrics), so it’s less than 2 kWh/day at full blast. Electricity is like 0.125 €/kWh, so that 24-hour period would be <0.25 €.
  Not accounting hardware in my costs, since I didn’t buy my hardware for running models. Running models is just something it can do in addition to what I got it for.
- NorwegianDude 11 hours ago
  
  The price, processed tokens, and output can be anything, it just depends on what GPU it is.
  Nvidia GPUs are much more efficient than Apple hardware for inference(and training).
igravious 13 hours ago
Cool beans. You're not the target audience then.
- Hamuko 13 hours ago
  
  Did I claim I was? I just said why I and people like me are not talking about it.
  
  1 reply →

anuramat 13 hours ago

> unlimited tokens for $50 a month

link?

> Why

imho everything but opus produces unusable code (fable was even better...), eg gpt5.5 seems to write the absolute worst code that still technically solves the problem; tbh I'd be totally willing to trade "raw intelligence" for "code taste"

more labs need to figure out whatever anthropic did to destroy everybody else on frontiercode bench

CuriouslyC 10 hours ago
Opus has the nickname "Slopus" in a lot of circles for a reason. It can write nice code in isolation, but the way it organizes that code and its rigor in addressing edge cases/making sure things are robust leave a lot to be desired. Opus is particularly famous for having a real problem reinventing stuff that already existed in the codebase because it wanted to get to work before exploring sufficiently.
- anuramat 7 hours ago
  
  what you're describing doesn't sound like such a big deal -- it's (A) obvious during review, (B) easy to fix in a single prompt, (C) simple enough to fix manually, (D) can be mitigated with tokenmaxxing (agent review passes, prompting, subagents, etc)
  regarding edge cases -- less is more in my experience, as removing is harder than adding