Comment by Alconicon
13 hours ago
Because OpenAI stands for AI leader.
If Gemini can create or edit an image, chatgpt needs to be able to do this too. Who wants to copy&paste prompts between ai agents?
Also if you want to have more semantics, you add image, video and audio to your model. It gets smarter because of it.
OpenAI is also relevant bigger than antropic and is known as a generic 'helper'. Antropic probably saw the benefits of being more focused on developer which allows it to succeed longer in the game for the amount of money they have.
> Who wants to copy&paste prompts between ai agents?
An AI!
The specialist vs generalist debate is still open. And for complex problems, sure, having a model that runs on a small galaxy may be worth it. But for most tasks, a fleet of tailor-made smaller models being called on by an agent seems like a solidly-precedented (albeit not singularity-triggering) bet.
not an expert by any means, but wouldn't smaller but highly refined models also output more reproducible results?
intuitively it sounds akin to the unix model...
But then again the main selling point of using LLMs as part of some code that solves a certain business need is that you don't have to finetune a usecase-specific model (like in the mid 2010s), you just prompt engineer a bit and it often magically works.
>Also if you want to have more semantics, you add image, video and audio to your model. It gets smarter because of it.
I think you are confusing generation with analysis. As far I am aware your model does not need to be good at generating images to be able to decode an image.
It is, to first approximation, the same thing. The generative part of genAI is just running the analysis model in reverse.
Now there are all sorts of tricks to get the output of this to be good, and maybe they shouldn't be spending time and resources on this. But the core capability is shared.
> The generative part of genAI is just running the analysis model in reverse.
I think that hasn't been the case since DeepDream?
I think you're partially right, but I don't think being an AI leader is the main motivation -- that's a side effect.
I think it's important to OpenAI to support as many use-cases as possible. Right now, the experience that most people have with ChatGPT is through small revenue individual accounts. Individual subscriptions with individual needs, but modest budgets.
The bigger money is in enterprise and corporate accounts. To land these accounts, OpenAI will need to provide coverage across as many use-cases as they can so that they can operate as a one-stop AI provider. If a company needs to use OpenAI for chat, Anthropic for coding, and Google for video, what's the point? If Google's chat and coding is "good enough" and you need to have video generation, then that company is going to go with Google for everything. For the end-game I think OpenAI is playing for, they will need to be competitive in all modalities of AI.
> Because OpenAI stands for AI leader.
It'll just end up spreading itself too thin and be second or third best at everything.
The 500lb gorilla in the room is Google. They have endless money and maybe even more importantly they have endless hardware. OpenAI are going to have an increasingly hard time competing with them.
That Gemini 3 is crushing it right now isn't the problem. It's Gemini 4 or 5 that will likely leave them in the dust for the general use case, meanwhile specialist models will eat what remains of their lunch.