Comment by mmsc

6 days ago

I understand that things are moving fast and all, but surely the.. 8? models which are currently available is a bit .. overwhelming for users that just want to get answers to their questions of life? What's the end goal with having so many models available?

40 comments

mmsc

nickysielicki 6 days ago

I just can’t believe nobody at the company has enough courage to tell their leadership that their naming scheme is completely stupid and insane. Four is greater than three, and so four should be better than three. The point of a name is to describe something so that you don’t confuse your users, not to be cute.

MallocVoidstar 6 days ago
The reason their naming scheme is so bad is because their initial attempts at GPT-5 failed in training. It was supposed to be done ~1 year ago. Because they'd promised that GPT-5 would be vastly more intelligent than GPT-4, they couldn't just name any random model "GPT-5", so they suddenly had to start naming things differently. So now there's GPT-4.5, GPT-4.1, the o-series, ...
- kaoD 6 days ago
  
  Surely there's a less stupid way than naming two very different models o4 and 4o.
transcriptase 6 days ago
What’s worse is that the app doesn’t even have descriptions. As if I’m supposed to memorize the use case for each based on:
GPT-4o
o3
o4-mini
o4-mini-high
GPT-4.5
GPT-4.1
GPT-4.1-mini
- occamschainsaw 4 days ago
  
  Even o3 can't figure out the naming scheme. When asked to generate names that would be easier to use by non-technical people, it assumed o3 and o4 are the smallest models:
``` Below is one straightforward, user-friendly approach you could adopt. It keeps two dimensions only—generation and tier—and reserves an optional “optimisation” suffix for special-purpose variants (e.g. vision, coding, long-context).
⸻
1. Core conventions
Element Purpose Example values Generation Major architectural release. Keep a whole number; use “.1”, “.2”… for mid-cycle improvements. 4, 4.1, 4.5 Tier Rough capability / cost band, easy to interpret. Lite, Standard, Pro, Ultra Suffix (optional) Special optimisation or domain specialisation. -LongCtx, -Vision, -Code
Why this works • No ambiguous letters or numerics – “o3” can be read as “03” or “oz”; avoid that entirely. • Self-explanatory language – non-technical users recognise “Lite” versus “Pro” instantly. • Scalable – new minor rev? bump the generation (4.2). Need a cheaper size? add a Nano tier without disturbing the rest.
⸻
2. Applying it to your current list
Current name Proposed new name Rationale GPT-4o GPT-4 Standard Baseline flagship of the 4-series. o3 GPT-4 Lite Same generation, lowest tier. o4-mini GPT-4 Lite+ (or GPT-4 Lite LongCtx if that’s the point) Indicates “Lite” family but a bit more capable; “+” or a suffix clarifies how. o4-mini-high GPT-4 Standard LongCtx (or GPT-4 Lite Pro) Pick one dimension: either it’s still “Lite” but higher context, or it has moved into “Standard”. GPT-4.5 GPT-4.5 Standard Mid-cycle architectural upgrade, default tier. GPT-4.1 GPT-4.1 Standard Ditto. GPT-4.1-mini GPT-4.1 Lite Same generation, smaller/cheaper option.
⸻
3. Quick style guide for future models 1. Stick to two words (or two words + optional suffix) GPT-5 Pro, GPT-5 Lite-Vision – still readable at a glance. 2. Reserve extra punctuation for special cases only Hyphens or the “+” symbol should signal meaning, not be decorative. 3. Publish a public matrix A small table in docs or the dashboard that maps Generation × Tier → context length, cost, latency eliminates guesswork.
⸻
One-line summary
GPT- [-Specialisation] keeps names short, descriptive and future-proof—so even non-technical users can tell instantly which model suits their needs. ```
- koakuma-chan 6 days ago
  
  Just use o4-mini for everything
  
  3 replies →
dmos62 4 days ago

If you obfuscate the naming, you obfuscate the value proposition, and people become easier to mislead into choosing an overly expensive model. Same as with Intel CPUs, or many many other hardware products.
browningstreet 6 days ago
At Techcrunch AI last week, the OpenAI guy started his presentation by acknowledging that OpenAI knows their naming is a problem and they're working on it, but it won't be fixed immediately.
- simonw 6 days ago
  
  Sam Altman has said the same thing on Twitter a few times. https://x.com/sama/status/1911906570835022319
  > how about we fix our model naming by this summer and everyone gets a few more months to make fun of us (which we very much deserve) until then?
  
  3 replies →
- moomin 6 days ago
  
  I know they have a deep relationship with Microsoft, but perhaps they shouldn’t have used Microsoft’s product naming department.
  
  3 replies →
aetherspawn 6 days ago
Came here to say this, the naming scheme is ridiculous and is getting more impossible to follow each day.
For example the other day they released a supposedly better model with a lower number..
- aetherspawn 6 days ago
  
  I’d honestly prefer they just have 3 personas of varying cost/intelligence: Sam, Elmo and Einstein or something, and then tack on the date, elmo-2025-1 and silently delete the old ones.

levocardia 6 days ago

There's a humorous version of Poe's law that says "any sufficiently genuine attempt to explain the differences between OpenAI's models is indistinguishable from parody"

Osyris 6 days ago

This is a much more expensive model to run and is only available to users who pay the most. I don't see an issue.

However, the "plus" plan absolutely could use some trimming.

djrj477dhsnv 5 days ago

If it's better (and newer) than gpt4, it shouldn't have a lower version number.

bachittle 6 days ago

free users don't have this model selector, and probably don't care which model they get so 4o is good enough. paid users at 20$/month get more models which are better, like o3. paid users at 200$/month get the best models that are also costing OpenAI the most money, like o3-pro. I think they plan to unify them with GPT-5.

stavros 6 days ago
That doesn't help much when we're asymptotically approaching GPT-5. We're probably going to be at GPT-4.9999 soon.
- rfw300 6 days ago
  
  Not necessarily true. GPT-4.1 was released after GPT-4.5-preview. Next model might be GPT-3.7.
nikcub 6 days ago
I'd be curious what proportion of paid users ever switch models. I'd guess < 10%
- CamperBob2 6 days ago
  
  I switch to o1-pro on occasion, but it is slow enough that I don't use it as much as some of the others. It is a reasonably-effective last resort when I'm not getting the answer quality that I think should be achievable. It's the best available reasoning model from any provider by a noticeable margin.
  Sounds like o3-pro is even slower, which is fine as long as it's better.
  o4-mini-high is my usual go-to model if I need something better than the default GPT4-du jour. I don't see much point in the others and don't understand why they remain available. If o3-pro really is consistently better, it will move o1-pro into that category for me.
- CuriouslyC 6 days ago
  
  If you're not at least switching from 4o to 4.1 you're doing it wrong.
  
  1 reply →

AtlasBarfed 6 days ago

I'd like one to do my test use case:

Port unix-sed from c to java with a full test suite and all options supported.

Somewhere between "it answers questions of life" and "it beats PhDs at math questions", I'd like to see one LLM take this, IMO, rather "pure" language task and succeeed.

It is complicated, but it isn't complex. It's string operations with a deep but not that deep expression system and flag set.

It is well-described and documented on the internet, and presumably training sets. It is succinctly described as a problem that virtually all computer coders would understand what it entailed if it were assigned to them. It is drudgerous, showing the opportunity for LLMs to show how they would improve true productivity.

GPT fails to do anything other than the most basic substitute operations. Claude was only slightly better, but to its detriment hallucinated massive amounts and made fake passing test cases that didn't even test the code.

The reaction I get to this test is ambivalence, but IMO if LLMs could help port entire software packages between languages with similar feature sets (aside from Turing Completeness), then software cross-use would explode, and maybe we could port "vulnerable" code to "safe" Rust en masse.

I get it, it's not what they are chasing customer-wise. They want to write (in n-gate terms) webcrap.

nipah 5 days ago

I have a very simple question with like, 5 lines at best, that basically no model, neither reasoning or simpler could grasp. For obvious reasons I'm not disclosing it here (because I fear data contamination in the long run), but it basically breaks the "reasoning" of those things. Unfortunately, I still can't try the o3-pro because the API version is not easily available, and I'm certainly not willing to pay for it in pro mode, but when it comes to the plus version (if it comes) I'll try. To this date, because of this question (and similar ones) I stand very unimpressed with those models, the marketing is a thousand times larger than reality, and I suspect people in general are surprisingly less capable of detecting intelligence than they think.
The normal o3 also managed to break 3 isolated installations of linux I was trying it with, a few days ago. The task was very simple, simply setup ubuntu with btrfs, timeshift and grub-btrfs and it managed to fail every single time (even when searching the web), so it was not impressive either.
jiggawatts 4 days ago

The massive real market here is enterprises that need to rewrite legacy code to modern platforms, retaining the business logic as-is but modernising the style.
.NET Framework 4.x to .NET 10, Python 2 to 3, Java 8 to <current version>, etc...
The advantage the LLMs have here is that staying within the same programming language and its paradigm is dramatically simpler than converting a "procedural" language like C to an object-oriented language like Java that has a wildly different standard library.
CamperBob2 6 days ago
How does the latest Gemini 2.5 Pro Ultra Flash Max Hemi XLT release do on that task? It obviously demands a massive context window.
- AtlasBarfed 5 days ago
  
  I'll check once I get the nitrous tanks and the aftermarket turbos overnighted from Japan arrive.

resters 6 days ago

Models are used for actual tasks where predictable behavior is a benefit. Models are also used on cutting-edge tasks where smarter/better outputs are highly valued. Some applications value speed and so a new, smaller/cheaper model can be just right.

I think the naming scheme is just fine and is very straightforward to anyone who pays the slightest bit of attention.

paxys 6 days ago

> users that just want to get answers to their questions of life

Those users go to chat.openai.com (or download the app), type text in the box and click send.

macawfish 6 days ago

Overwhelming yet pretty underwhelming