Comment by zozbot234
11 hours ago
The weights likely won't be available wrt. this model since this is part of the Max series that's always been closed. The most "open" you get is the API.
11 hours ago
The weights likely won't be available wrt. this model since this is part of the Max series that's always been closed. The most "open" you get is the API.
The closed nature is one thing, but the opaque billing on reasoning tokens is the real dealbreaker for integration. If you are bootstrapping a service, I don't see how you can model your margins when the API decides arbitrarily how long to think and bill for a prompt. It makes unit economics impossible to predict.
FYI: Newer LLM hosting APIs offer control over amount of "thinking" (as well as length of reply) -- some by token count others by an enum (high low, medium, etc.).
Doesn't ClosedAI do the same? Thinking models bill tokens, but the thinking steps are encrypted.
Destroying unit economics is a bit dramatic... you can chose thinking effort for modern models/APIs and add guidance to the system prompts
You just have to plan for the worst case.