Comment by kristofferR

4 days ago

To use that criticism for this release ain't really fair, as these will replace the old models (o3 will replace o1, o4-mini will replace o3-mini).

On a more general level - sure, but they aren't planning to use this release to add a larger number of models, it's just that deprecating/killing the old models can't be done overnight.

As someone who doesn't use anything OpenAI (for all the reasons), I have to agree with the GP. It's all baffling. Why is there an o3-mini and an o4-mini? Why on earth are there so many models?

Once you get to this point you're putting the paradox of choice on the user - I used to use a particular brand toothpaste for years until it got to the point where I'd be in the supermarket looking at a wall of toothpaste all by the same brand with no discernible difference between the products. Why is one of them called "whitening"? Do the others not do that? Why is this one called "complete" and that one called "complete ultra"? That would suggest that the "complete" one wasn't actually complete. I stopped using that brand of toothpaste as it become impossible to know which was the right product within the brand.

If I was assessing the AI landscape today, where the leading models are largely indistinguishable in day to day use, I'd look at OpenAI's wall of toothpaste and immediately discount them.

  • (I work at OpenAI.)

    In ChatGPT, o4-mini is replacing o3-mini. It's a straight 1-to-1 upgrade.

    In the API, o4-mini is a new model option. We continue to support o3-mini so that anyone who built a product atop o3-mini can continue to get stable behavior. By offering both, developers can test both and switch when they like. The alternative would be to risk breaking production apps whenever we launch a new model and shut off developers without warning.

    I don't think it's too different from what other companies do. Like, consider Apple. They support dozens of iPhone models with their software updates and developer docs. And if you're an app developer, you probably want to be aware of all those models and docs as you develop your app (not an exact analogy). But if you're a regular person and you go into an Apple store, you only see a few options, which you can personalize to what you want.

    If you have concrete suggestions on how we can improve our naming or our product offering, happy to consider them. Genuinely trying to do the best we can, and we'll clean some things up later this year.

    Fun fact: before GPT-4, we had a unified naming scheme for models that went {modality}-{size}-{version}, which resulted in names like text-davinci-002. We considered launching GPT-4 as something like text-earhart-001, but since everyone was calling it GPT-4 anyway, we abandoned that system to use the name GPT-4 that everyone had already latched onto. Kind of funny how our unified naming scheme originally made room for 999 versions, but we didn't make it past 3.

    • Have any of the models been deprecated? It seems like a deprecation plan and definition of timelines would be extraordinarily helpful.

      I have not seen any sort of "If you're using X.122, upgrade to X.123, before 202X. If you're using X.120, upgrade to anything before April 2026, because the model will no longer be available on that date." ... Like all operating systems and hardware manufacturers have been doing for decades.

      Side note, it's amusing that stable behavior is only available on a particular model with a sufficiently low temperature setting. As near-AGI shouldn't these models be smart enough to maintain consistency or improvement from version to version?

      3 replies →

  • > Why is there an o3-mini and an o4-mini? Why on earth are there so many models?

    Because if they removed access to o3-mini — which I have tested, costed, and built around — I would be very angry. I will probably switch to o4-mini when the time is right.

  • They keep a lot of models around for backward compatibility for API users. This is confusing, but not inherently a bad idea.

  • You could develop an AI model to help pick the correct AI model.

    Now you’ve got 18 problems.

    • I think you're trying to re-contextualize the old Standards joke, but I actually think you're right -- if a front end model could dispatch as appropriate to the best backend model for a given prompt, and turn everything into a high level sort of mixture of models, I think that would be great, and a great simplifying step. Then they can specialize and optimize all they want, CPU goes down, responses get better and we only see one interface.

      2 replies →