Comment by iwontberude

2 months ago

Your point could have made sense but the amount of inference per request is also going up faster than the costs are going down.

3 comments

iwontberude

supern0va 2 months ago

The parent said: "Of course, by then we'll have much more capable models. So if you want SOTA, you might see the jump to $10-12. But that's a different value proposition entirely: you're getting significantly more for your money, not just paying more for the same thing."

SOTA improvements have been coming from additional inference due to reasoning tokens and not just increasing model size. Their comment makes plenty of sense.

manmal 2 months ago

Is it? Recent new models tend to need fewer tokens to achieve the same outcome. The days of ultrathink are coming to an end, Opus is well usable without it.