Comment by storystarling
9 hours ago
The closed nature is one thing, but the opaque billing on reasoning tokens is the real dealbreaker for integration. If you are bootstrapping a service, I don't see how you can model your margins when the API decides arbitrarily how long to think and bill for a prompt. It makes unit economics impossible to predict.
FYI: Newer LLM hosting APIs offer control over amount of "thinking" (as well as length of reply) -- some by token count others by an enum (high low, medium, etc.).
Doesn't ClosedAI do the same? Thinking models bill tokens, but the thinking steps are encrypted.
Destroying unit economics is a bit dramatic... you can chose thinking effort for modern models/APIs and add guidance to the system prompts
You just have to plan for the worst case.