← Back to context

Comment by OutOfHere

3 hours ago

OpenAI in my estimation has the habit of dropping a model's quality after its introduction. I definitely recall the web ChatGPT 5.2 being a lot better when it was introduced. A week or two later, its quality suddenly dropped. The initial high looked to be to throw off journalists and benchmarks. As such, nothing that OpenAI says in terms of model speed can be trusted. All they have to do is lower the reasoning effort on average, and boom, it becomes 40% faster. I hope I am wrong, because if I am right, it's a con game.

Starting off the ChatGPT Plus web users with the Pro model, then later swapping it for the Standard model -- would meet the claims of model behavior consistency, while still qualifying as shenanigans.

It's good to be skeptical, but I'm happy to share that we don't pull shenanigans like this. We actually take quite a bit of care to report evals fairly, keep API model behavior constant, and track down reports of degraded performance in case we've accidentally introduced bugs. If we were degrading model behavior, it would be pretty easy to catch us with evals against our API.

In this particular case, I'm happy to report that the speedup is time per token, so it's not a gimmick from outputting fewer tokens at lower reasoning effort. Model weights and quality remain the same.

  • It looks like you do pull shenanigans like these [0]. The person you're replying to even mentioned "ChatGPT 5.2", but you're specifically talking only about the API, while making it sound like it applies across the board. Also appreciate the attempt to further hide this degradation of the product they paid for from users by blocking the prompt used to figure this out.

    Happy to retract if you can state [0] is false.

    [0] https://x.com/btibor91/status/2018754586123890717

  • I've seen Sam Altman make similar claims in interviews, and I now interpret every statement from an Open AI employee (and especially Sam) as if an Aes Sedai had said it.

    I.e.: "keep API model behavior constant" says nothing about the consumer ChatGPT web app, mobile apps, third-party integrations, etc.

    Similarly, it might mean very specifically that a "certain model timestamp" remains constant but the generic "-latest" or whatever model name auto-updates "for your convenience" to the new faster performance achieved through quantisation or reduced thinking time.

    You might be telling the full, unvarnished truth, but after many similar claims from OpenAI that turned out to be only technically true, I remain sceptical.

  • Hey Ted, can you confirm whether this 40% improvement is specific to API customers or if that's just a wording thing because this is the OpenAI Developers account posting?

  • so what actually happens if it isnt shenanigans?

    its worth you guys doing on your end, some analysis of why customers are getting worse results a week or two later, and putting out some guidelines about what context is poisonous and the like

  • Starting off the ChatGPT Plus web users with the Pro model, then later swapping it for the Standard model -- would meet the claims of model behavior consistency, while still qualifying as shenanigans.