← Back to context

Comment by pimeys

1 day ago

They are but from our evals for example GLM 5.2 (unquantized) performs as well as Opus but uses more tokens and takes more time.

I really wish this would change soon but they are not there yet.

Using even double the total tokens and taking, what, 2-3x the time?, still seems worth it if prices are 5x+ cheaper (which OpenRouter [1] claims is the case).

On NeuralWatt for my personal projects at home (not affiliated, just a happy customer), I get so much more mileage out of GLM than I get out of Claude at work, specifically because it's priced as a hammer I can pound any nail-shaped-object with, not a delicacy I need to carefully budget-analyze to try to figure out if it's worth burning my monthly spend limits on this task.

https://openrouter.ai/compare/z-ai/glm-5.2/anthropic/claude-...

I thought true token use was being hidden by anthropic and openai both

  • No, they do specify token counts, as they let you pay for them. They just don't tell you what these thinking tokens actually are.

    • Though because they don't show you, they could be lying about it. Very unlikely, I think, would be too dangerous IMO. But technically possible