I'm going to experiment with this, but unless it's insanely more efficient in token usage than anything else I've tried, the only way to keep costs more or less acceptable is through a subscription.
Because sometimes (a lot of the time in my experience) third-party providers and inference engines fail to implement the model correctly in ways that are sometimes very subtle and not obvious.
Deepinfra for example is not preserving thinking correctly for GLM5.1, even though they are for GLM5. This is one of the more obvious issues that crop up.
If it's open then there will be multiple providers. I see it is on OpenRouter now.
I'm going to experiment with this, but unless it's insanely more efficient in token usage than anything else I've tried, the only way to keep costs more or less acceptable is through a subscription.
Why use "their API"? It is an open model, use any provider on OpenRouter
Because sometimes (a lot of the time in my experience) third-party providers and inference engines fail to implement the model correctly in ways that are sometimes very subtle and not obvious.
Deepinfra for example is not preserving thinking correctly for GLM5.1, even though they are for GLM5. This is one of the more obvious issues that crop up.