Comment by iwontberude
1 day ago
Your point could have made sense but the amount of inference per request is also going up faster than the costs are going down.
1 day ago
Your point could have made sense but the amount of inference per request is also going up faster than the costs are going down.
The parent said: "Of course, by then we'll have much more capable models. So if you want SOTA, you might see the jump to $10-12. But that's a different value proposition entirely: you're getting significantly more for your money, not just paying more for the same thing."
SOTA improvements have been coming from additional inference due to reasoning tokens and not just increasing model size. Their comment makes plenty of sense.
Is it? Recent new models tend to need fewer tokens to achieve the same outcome. The days of ultrathink are coming to an end, Opus is well usable without it.