← Back to context

Comment by _fat_santa

4 days ago

So at this point OpenAI has 6 reasoning models, 4 flagship chat models, and 7 cost optimized models. So that's 17 models in total and that's not even counting their older models and more specialized ones. Compare this with Anthropic that has 7 models in total and 2 main ones that they promote.

This is just getting to be a bit much, seems like they are trying to cover for the fact that they haven't actually done much. All these models feel like they took the exact same base model, tweaked a few things and released it as an entirely new model rather than updating the existing ones. In fact based on some of the other comments here it sounds like these are just updates to their existing model, but they release them as new models to create more media buzz.

Im old enough to remember the mystery and hype before o*/o1/strawberry that was supposed to be essentially AGI. We had serious news outlets write about senior people at OpenAI quitting because o1 was SkyNet

Now we're up to o4, AGI is still not even in near site (depending on your definition, I know). And OpenAI is up to about 5000 employees. I'd think even before AGI a new model would be able to cover for at least 4500 of those employees being fired, is that not the case?

  • Remember that Docusign has 7,000 employees. I think OpenAI is pretty lean for what they're accomplishing.

    • I don't think these comparisons are useful. Every time you look at companies like LinkedIn or Docusign, yeah - they have a lot of staff, but a significant proportion of this are functions like sales, customer support, and regulatory compliance across a bazillion different markets; along with all the internal tooling and processes you need to support that.

      OpenAI is at a much earlier stage in their adventures and probably doesn't have that much baggage. Given their age and revenue streams, their headcount is quite substantial.

    • If we're making comparisons, its more like someone selling a $10,000 course on how to be a millionaire

      Not directly from OpenAI - but people in the industry is advertising how these advanced models can replace employees, yet they keep on going on hiring tears (including OpenAI). Lets see the first company to stand behind their models, and replace 50% of their existing headcount with agents. That to me would be a sign these things are going to replace peoples jobs. Until I see that, if OpenAI can't figure out how to replace humans with models, then no one will

      I mean could you imagine if todays announcement was - the chatgpt.com webdev team has been laid off, and all new features and fixes will be complete by Codex CLI + o4-mini. That means they believe in the product theyre advertising. Until they do something like that, theyll keep on trusting those human engineers and try selling other people on the dream

      2 replies →

    • Yes and Amazon has 1.52 million employees. How many developers could they possibly need?

      Or maybe it’s just nonsensical to compare the number of employees across companies - especially when they don’t do nearly the same thing.

      On a related note, wait until you find out how many more employees that Apple has than Google since Apple has hundreds of retail employees.

      5 replies →

  • True.

    Deep learning models will continue to improve as we feed them more data and use more compute, but they will still fail at even very simple tasks as long as the input data are outside their training distribution. The numerous examples of ChatGPT (even the latest, most powerful versions) failing at basic questions or tasks illustrate this well. Learning from data is not enough; there is a need for the kind of system-two thinking we humans develop as we grow. It is difficult to see how deep learning and backpropagation alone will help us model that. https://medium.com/thoughts-on-machine-learning/why-sam-altm...

  • > Im old enough to remember the mystery and hype before o*/o1/strawberry

    So at least two years old?

    • Honestly, sometimes I wonder if most people these days kinda aren't at least that age, you know? Or less inhibited about acting it than I believe I recall people being last decade. Even compared to just a few years back, people seem more often to struggle to carry a thought, and resort much more quickly to emotional belligerence.

      Oh, not that I haven't been as knocked about in the interim, of course. I'm not really claiming I'm better, and these are frightening times; I hope I'm neither projecting nor judging too harshly. But even trying to discount for the possibility, there still seems something new left to explain.

      4 replies →

    • I think people expected reasoning to be more than just trained chain of thought (which was known already at the time). On the other hand, it is impressive that CoT can achieve so much.

  • Yeah, I don't know exactly what at an AGI model will look like, but I think it would have more than 200k context window.

    • Do you have a 200k context window? I don't. Most humans can only keep 6 or 7 things in short term memory. Beyond those 6 or 7 you are pulling data from your latent space, or replacing of the short term slots with new content.

      7 replies →

    • I'm not quite AGI, but I work quite adequately with a much, much smaller memory. Maybe AGI just needs to know how to use other computers and work with storage a bit better.

    • I'd think it would be able to at least suggest which model to use rather than just having 6 for you to choose from.

  • I’m not an AI researcher but I’m not convinced these contemporary artificial neural networks will get us to AGI, even assuming an acceleration to current scaling pace. Maybe my definition of AGI is off but I’m thinking what that means is a machine that can think, learn and behave in the world in ways very close to human. I think we need a fundamentally different paradigm for that. Not something that is just trained and deployed like current models, but something that is constantly observing, constantly learning and constantly interacting with the real world like we do. AHI, not AGI. True AGI may not exist because there are always compromises of some kind.

    But, we don’t need AGI/AHI to transform large parts of our civilization. And I’m not seeing this happen either.

  • > Now we're up to o4, AGI is still not even in near site (depending on your definition, I know)

    It's not only definition. Some googler was sure their model was conscious.

  • Meanwhile even the highest ranked models can’t do simple logic tasks. GothamChess on YouTube did some tests where he played against a bunch of the best models and every single one of them failed spectacularly.

    They’d happily lose a queen to take a pawn. They failed to understand how pieces are even allowed to move, hallucinated the existence of new pieces, repeatedly declared checkmate when it wasn’t, etc.

    I tried it last night with Gemini 2.5 Pro and it made it 6 turns before it started making illegal moves, and 8 turns before it got so confused about the state of the board before it refused to play with me any longer.

    I was in the chess club in 3rd grade. One of the top ranked LLMs in the world is vastly dumber than I was in 3rd grade. But we’re going to pour hundreds of billions into this in the hope that it can end my career? Good luck with that, guys.

    • Chess is not exactly a simple logic task. It requires you to keep track of 32 things in a 2d space.

      I remember being extremely surprised when I could ask GPT3 to rotate a 3d model of a car in it's head and ask it about what I would see when sitting inside, or which doors would refuse to open because they're in contact with the ground.

      It really depends on how much you want to shift the goalposts on what constitutes "simple".

      1 reply →

    • I'm not sure why people are expecting a language model to be great at chess. Remember they are trained on text, which is not the best medium for representing things like a chess board. They are also "general models", with limited training on pretty much everything apart from human language.

      An Alpha Star type model would wipe the floor at chess.

      11 replies →

  • > We had serious news outlets write about senior people at OpenAI quitting because o1 was SkyNet

    I wonder if any of the people that quit regret doing so.

    Seems a lot like Chicken Little behavior - "Oh no, the sky is falling!"

    How anyone with technical acumen thinks current AI models are conscious, let alone capable of writing new features and expanding their abilities is beyond me. Might as well be afraid of calculators revolting and taking over the world.

"haven't actually done much" being popularizing the chat llm and absolutely dwarfing the competition in paid usage

  • Relative to the hype they've been spinning to attract investment, casting the launch and commercialization of ChatGPT as their greatest achievement really is a quite significant downgrade, especially given that they really only got there first because they were the first entity reckless enough to deploy such a tool to the public.

    It's easy to forget what smart, connected people were saying about how AI would evolve by <current date> ~a year ago, when in fact what we've gotten since then is a whole bunch of diminishing returns and increasingly sketchy benchmark shenanigans. I have no idea when a real AGI breakthrough will happen, but if you're a person who wants it to happen (I am not), you have to admit to yourself that the last year or so has been disappointing---even if you won't admit it to anybody else.

  • ChatGPT was released two and a half years ago though. Pretty sure that at some point Sam Altman had promised us AGI by now.

    The person you're responding to is correct that OpenAI feels a lot more stagnant than other players (like Google, which was nowhere to be seen even one year and a half ago and now has the leading model on pretty much every metric, but also DeepSeek, who built a competitive model in a year that runs for much cheaper).

    • Google has the leading model on pretty much every metric

      Correction: Google had the leading model for three weeks. Today it’s back to the second place.

      6 replies →

  • ChatGPT was released in 2022, so OP's point stands perfectly well.

    • They're rumored to be working on a social network to rival X with the focus being on image generations.

      https://techcrunch.com/2025/04/15/openai-is-reportedly-devel...

      The play now seems to be less AGI, more "too big to fail" / use all the capital to morph into a FAANG bigtech.

      My bet is that they'll develop a suite of office tools that leverage their model, chat/communication tools, a browser, and perhaps a device.

      They're going to try to turn into Google (with maybe a bit of Apple and Meta) before Google turns into them.

      Near-term, I don't see late stage investors as recouping their investment. But in time, this may work out well for them. There's a tremendous amount of inefficiency and lack of competition amongst the big tech players. They've been so large that nobody else could effectively challenge them. Now there's a "startup" with enough capital to start eating into big tech's more profitable business lines.

      11 replies →

  • seriously. the level of arrogance combined with ignorance is awe inspiring.

    • True. They've blown their absolutely massive lead with power users to Anthropic and Google. So they definitely haven't done nothing.

Research by METR suggests that frontier LLMs can perform software tasks over exponentially longer time horizon required for human engineers, with ~7-month for each doubling. o3 is above the trend line.

https://x.com/METR_Evals/status/1912594122176958939

—-

The AlexNet paper which kickstarted the deep learning era in 2012 was ahead of the 2nd-best entry by 11%. Many published AI papers then advanced SOTA by just a couple percentage points.

o3 high is about 9% ahead of o1 high on livebench.ai and there are also quite a few testimonials of their differences.

Yes, AlexNet made major strides in other aspects as well but it’s been just 7 months since o1-preview, the first publicly available reasoning model, which is a seminal advance beyond previous LLMs.

It seems some people have become desensitized to how rapidly things are moving in AI, despite its largely unprecedented pace of progress.

Ref:

- https://proceedings.neurips.cc/paper_files/paper/2012/file/c...

- https://livebench.ai/#/

  • Imagenet had improved the error rate by 100*11/25=44%.

    o1 to o3 error rate went from 28 to 19, so 100*9/28=32%.

    But these are meaningless comparisons because it’s typically harder to improve already good results.

OpenAI isn't selling GPT-4 or o1 or o4-mini or turbo or whatever else to the general public. These announcements may as well be them releasing GPT v12.582.599385. No one outside of a small group of nerds cares. The end consumer is going to chatgpt.com and typing things in the box.

  • They have an enterprise business too. I think it's relevant for that.

    • And that’s exactly why their model naming and release process looks like this right now.

  • $20 Plus subscription give access to o1 and Deep Research (10 uses/month). I'm pretty sure general public can get access through API as well.

    • Right and most people are not going to spend 200$+ /mo on ChatGPT.. Maybe businesses will but at this point they have too many choices.

> This is just getting to be a bit much, seems like they are > trying to cover for the fact that they haven't actually done much

Or perhaps they're trying to make some important customers happy by showing movement on areas the customers care about. Subjectively, customers get locked in by feeling they have the inside track, and these small tweaks prove that. Objectively, the small change might make a real difference to the customer's use case.

Similarly, it's important to force development teams to actually ship, and shipping more frequently reduces risk, so this could reflect internal discipline.

As for media buzz, OpenAI is probably trying to tamp that down; they have plenty of first-mover advantage. More puffery just makes their competitors seem more important, and the risk to their reputation of a flop is a lot larger than the reward of the next increment.

As for "a bit much", before 2023 I was thinking I could meaningfully track progress and trade-off's in selecting tech, but now the cat is not only out of the bag, it's had more litters than I can count. So, yeah - a bit much!

  • > Or perhaps they're trying to make some important customers happy by showing movement on areas the customers care about

    Or make important investors happy, they need to justify the latest $40 billion round

The old Chinese strategy of having 7343 different phone models with almost the same specs to confuse the customer better

  • This sounds like recent Dell and Lenovo strategies

    • recent? they've been doing this for decades.

      person a: "I just got an new macbook pro!"

      person b: "Nice! I just got a Lenovo YogaPilates Flipfold XR 3299 T92 Thinkbookpad model number SRE44939293X3321"

      ...

      person a: "does that have oled?"

      person b: "Lol no silly that is model SRE44939293XB3321". Notice the B in the middle?!?! That is for OLED.

      1 reply →

  • not only that. filling search lists on eBay with your products is old sellers' tactics. Try to search for used Dell workstation or server and you will see pages and pages from the same seller.

To use that criticism for this release ain't really fair, as these will replace the old models (o3 will replace o1, o4-mini will replace o3-mini).

On a more general level - sure, but they aren't planning to use this release to add a larger number of models, it's just that deprecating/killing the old models can't be done overnight.

  • As someone who doesn't use anything OpenAI (for all the reasons), I have to agree with the GP. It's all baffling. Why is there an o3-mini and an o4-mini? Why on earth are there so many models?

    Once you get to this point you're putting the paradox of choice on the user - I used to use a particular brand toothpaste for years until it got to the point where I'd be in the supermarket looking at a wall of toothpaste all by the same brand with no discernible difference between the products. Why is one of them called "whitening"? Do the others not do that? Why is this one called "complete" and that one called "complete ultra"? That would suggest that the "complete" one wasn't actually complete. I stopped using that brand of toothpaste as it become impossible to know which was the right product within the brand.

    If I was assessing the AI landscape today, where the leading models are largely indistinguishable in day to day use, I'd look at OpenAI's wall of toothpaste and immediately discount them.

    • (I work at OpenAI.)

      In ChatGPT, o4-mini is replacing o3-mini. It's a straight 1-to-1 upgrade.

      In the API, o4-mini is a new model option. We continue to support o3-mini so that anyone who built a product atop o3-mini can continue to get stable behavior. By offering both, developers can test both and switch when they like. The alternative would be to risk breaking production apps whenever we launch a new model and shut off developers without warning.

      I don't think it's too different from what other companies do. Like, consider Apple. They support dozens of iPhone models with their software updates and developer docs. And if you're an app developer, you probably want to be aware of all those models and docs as you develop your app (not an exact analogy). But if you're a regular person and you go into an Apple store, you only see a few options, which you can personalize to what you want.

      If you have concrete suggestions on how we can improve our naming or our product offering, happy to consider them. Genuinely trying to do the best we can, and we'll clean some things up later this year.

      Fun fact: before GPT-4, we had a unified naming scheme for models that went {modality}-{size}-{version}, which resulted in names like text-davinci-002. We considered launching GPT-4 as something like text-earhart-001, but since everyone was calling it GPT-4 anyway, we abandoned that system to use the name GPT-4 that everyone had already latched onto. Kind of funny how our unified naming scheme originally made room for 999 versions, but we didn't make it past 3.

      7 replies →

    • > Why is there an o3-mini and an o4-mini? Why on earth are there so many models?

      Because if they removed access to o3-mini — which I have tested, costed, and built around — I would be very angry. I will probably switch to o4-mini when the time is right.

      2 replies →

    • They keep a lot of models around for backward compatibility for API users. This is confusing, but not inherently a bad idea.

Well, in fairness, Anthropic has less because 1) they started later, 2) could learn from competitors' mistakes, 3) focused on enterprise and not consumer, 4) have fewer resources.

The point is taken — and OpenAI agrees. They have said they are actively working on simplifying the offering. I just think it's a bit unfair. We have perfect hindsight today here on HackerNews and also did zero of the work to produce the product.

Model fatigue is a real thing - Particularly with their billing model that is wildly different from model to model and gives you more headroom as you spend more. We spend a lot of time and effort running tests across many models to balance for that cost/performance ratio. When you can run 300k tokens per min on a shittier model, or 10k tokens per min on a better model - you want to use the cheaper model but if the performance isn't there then you gotta pivot. Can I use tools here? Can I use function calling here? Do I use the chat API, the chat completions API, or the responses API? Do either of those work with the model I want to use, or only with other models?

I almost wonder if this is intentional ... because when you create a quagmire of insane inter-dependent billing scenarios you end up with a product like AWS that can generate substantial amounts of revenue from sheer ignorance or confusion. Then you can hire special consultants to come in and offer solutions to your customers in order to wade through the muck on your behalf.

Dealing with OpenAI's API's is a straight up nightmare.

Most industries, or categories go through cycles of fragmentation and consolidation.

AI is currently in a high growth expansion phase. The leads to rapid iteration and fragmentation because getting things released is the most important thing.

When the models start to plateau or the demands on the industry are for profit you will see consolidation start.

  • having many models from the same company in some haphazard strategy doesn't equate to "industry fragmentation". it's just confusion

    • OpenAI's continued growth and press coverage relative to their peers leads to me to believe it isn't *just* confusion, even if it is confusing.

      1 reply →

They do this because people like to have predictability. A new model may behave quite differently on something that’s important for a use case.

Also, there are a lot of cases where very small models are just fine and others where they are not. It would always make sense to have the smallest highest performing models available.

  • I have *no idea* why you're being downvoted on this.

    If I want to take advantage of a new model, I must validate that the structured queries I've made to the older models still work on the new models.

    The last time I did a validation and update. Their Responses. Had. Changed.

    API users need dependability, which means they need older models to keep being usable.

    • > I have no idea why you're being downvoted on this.

      I probably offended someone at YC and my account is being punished.

I can not believe that we feel that this is what's most worth talking about here (by visibility). At this point I truly wonder if AI is what will make HN side with the luddites.

This seems like a perfect use case for "agentic" AI. OpenAI can enrich the context window with the strengths and weakness of each model, and when a user prompts for something the model can say "Hey, I'm gonna switch to another model that is better at answering this sort of question." and the user can accept or reject.

> This is just getting to be a bit much, seems like they are trying to cover for the fact that they haven't actually done much. All these models feel like they took the exact same base model, tweaked a few things and released it as an entirely new model

OpenAI's progress lately:

  2024 December - first reasoning model (official release)

  2025 February - deep search

  2025 March - true multi-modal image generation

  2025 April - reasoning model with tools

I'm not sure why people say they haven't done much. We couldn't even dream of stuff like this five years ago, and now releasing groundbreaking/novel features every month is considered "meh"... I think we're spoiled and can't appreciate anything anymore :)

If there are incremental gains in each release, why would they hold them back? The amount of exhaust coming off of each release is gold for the internal teams. The naming convention is bad, and the CPO just admitted as much on Lenny's podcast, but I am not sure why incremental releases is a bad thing.

There are 9 models in the ChatGPT model picker and they have stated that it's their goal to get rid of the model picker because everyone finds it annoying.

Think for 30 seconds about why they might in good faith do what they do.

Do you use any of them? Are you a developer? Just because a model is non-deterministic it doesn't mean developers don't want some level of consistency, whether it be about capabilities, cost, latency, call structure etc.

you'd think they could use AI to interpret the best model for your use case so you don't even have to think about it. Run the first few API calls in parallel, grade the result, and then send the rest to whatever works best

> All these models feel like they took the exact same base model, tweaked a few things and released it as an entirely new model rather than updating the existing ones.

That's not a problem in and of itself. It's only a problem if the models aren't good enough.

Judging by ChatGPT's adoption, people seem to think they're doing just fine.