← Back to context

Comment by energy123

6 days ago

Is it o3 (low), o3 (medium) or o3 (high)? Different model names have crept into the various benchmarks over the last few months.

o3 is a model, and reasoning effort (high/medium/low) is a parameter that goes into the model.

o3 pro is a different thing - it’s not just o3 with maximum remaining effort.

  • Why's it called o3 then if it's a different thing? There's already a rather extreme amount of confusion with the model names and it's not clear _at all_ which model would be "the best" in terms of response quality.

    Here's the current state with version numbers as far as I can piece it together (using my best guess at naming of each component of the version identifier. Might be totally wrong tho):

    1) prefix (optional): "gpt-", "chatgpt-"

    2) family (required): o1, o3, o4, 4o, 3.5, 4, 4.1, 4.5,

    3) quality? (optional): "nano", "mini", "pro", "turbo"

    4) type (optional): "audio", "search"

    5) lifecycle (optional): "preview", "latest"

    6) date (optional): 2025-04-14, 2024-05-13, 1106, 0613, 0125, etc (I assume the last ones are a date without a year for 2024?)

    7) size (optional): "16k"

    Some final combinations of these version number components are as small as 1 ("o3") or as large as 6 ("gpt-4o-mini-search-preview-2024-12-17").

    Given this mess, I can't blame people assuming that the "best" model is the one with the "biggest" number, which would rank the model families as: 4.5 (best) > 4.1 > 4 > 4o > o4 > 3.5 > o3 > o1 (worst).

    • o3 pro is based on o3 and its style and outputs will be quite similar to o3.

      As an analogy, think of it like this:

      o3-low ~ Ford Mustang with the accelerator gently pressed

      o3-medium ~ Ford Mustang with the accelerator pressed

      o3-high ~ Ford Mustang with the accelerator heavily pressed

      o3 pro ~ Ford Mustang GT

      Even though a Mustang GT is a different car than a Mustang, you don’t give it a totally different name (eg Palomino). The similarity in name signals it has a lot of the same characteristics but a souped up engine. Same for o3 pro.

      Fun fact: before GPT-4, we had a unified naming scheme for models that went {modality}-{size}-{version}, which resulted in names like text-davinci-002. We considered launching GPT-4 as something like text-earhart-001, but since everyone was calling it GPT-4 anyway, we abandoned that system to use the name GPT-4 that everyone had already latched onto. Kind of funny how our original unified naming scheme made room for 999 versions, but we didn't make it past 3.

      Edit: When I say the Mustang GT is a different car than a Mustang - I mean it literally. If you bought a Mustang GT and someone delivered a Mustang with a different trim, you wouldn't say "great, this is just what I ordered, with the same features/behavior/value." That we call it a different trim is a linguistic choice to signal to consumers that it's very similar, and built on the same production line, but comes with a different engine or different features. Similar to o3 pro.

      3 replies →

    • My guess is this comes from an org structure where you have multiple "pods" working on different research. Who comes up with the next shippable model and when that happens is kind of random and the chaotic naming system comes from that. It's just my speculation and could be wildly wrong.

  • Could someone there maybe possibly use, oh I dunno, ChatGPT and come up with some better product names?