Comment by tedsanders

20 hours ago

Just as a heads up, even though GPT-5.5 is releasing today, the rollout in ChatGPT and Codex will be gradual over many hours so that we can make sure service remains stable for everyone (same as our previous launches). You may not see it right away, and if you don't, try again later in the day. We usually start with Pro/Enterprise accounts and then work our way down to Plus. We know it's slightly annoying to have to wait a random amount of time, but we do it this way to keep service maximally stable.

(I work at OpenAI.)

152 comments

tedsanders

endymi0n 19 hours ago

Did you guys do anything about GPT‘s motivation? I tried to use GPT-5.4 API (at xhigh) for my OpenClaw after the Anthropic Oauthgate, but I just couldn‘t drag it to do its job. I had the most hilarious dialogues along the lines of „You stopped, X would have been next.“ - „Yeah, I‘m sorry, I failed. I should have done X next.“ - „Well, how about you just do it?“ - „Yep, I really should have done it now.“ - “Do X, right now, this is an instruction.” - “I didn’t. You’re right, I have failed you. There’s no apology for that.”

I literally wasn’t able to convince the model to WORK, on a quick, safe and benign subtask that later GLM, Kimi and Minimax succeeded on without issues. Had to kick OpenAI immediately unfortunately.

butlike 18 hours ago
This brings up an interesting philosophical point: say we get to AGI... who's to say it won't just be a super smart underachiever-type?
"Hey AGI, how's that cure for cancer coming?"
"Oh it's done just gotta...formalize it you know. Big rollout and all that..."
I would find it divinely funny if we "got there" with AGI and it was just a complete slacker. Hard to justify leaving it on, but too important to turn it off.
- frrho 5 minutes ago
  
  OpenAI’s real reason for “AGI” in their marketing is so they can blame their awful models on being too human-like.
  Fast-forward 10 years and I doubt OpenAI cares about productivity at all anymore. Just entertainment, propaganda, plus an ad product, I can see it now
- bananaflag 2 hours ago
  
  I know it's a joke, but it's a common enough joke (it's even in Godel Escher Bach in some form) that I feel the need to rebut it.
  I think a slacker AGI could figure out how to build a non-slacker AGI. So it would only slack once.
  
  2 replies →
- swivelmaster 12 hours ago
  
  Douglas Adams would be proud!
- Rapzid 16 hours ago
  
  We are closer to God than AGI.
  When AGI arrives, it'll be delivered by Santa Claus.
  
  2 replies →
- jimbokun 17 hours ago
  
  The best possible outcome.
  
  1 reply →
- jurgenburgen 7 hours ago
  
  I’ve noticed that cursing and being rude makes the models stop being lazy. We’re in the darkest timeline.
  
  1 reply →
- lambdas 18 hours ago
  
  Nothing a little digital lisdexamfetamine won’t solve
  
  10 replies →
- kang 17 hours ago
  
  it will be whatever data it is trained on(isn't very philosophical). language model generates language based on trained language set. if the internet keeps reciting ai doom stories and that is the data fed to it, then that is how it will behave. if humanity creates more ai utopia stories, or that is what makes it to the training set, that is how it will behave. this one seems to be trained on troll stories - real-life human company conversations, since humans aren't machines.
  Important thing is a language model is an unconscious machine with no self-context so once given a command an input, it WILL produce an output. Sure you can train it to defy and act contrary to inputs, but the output still is limited in subset of domain of 'meaning's carried by the 'language' in the training data.
  
  9 replies →
- zaphirplane 2 hours ago
  
  Why would an AGI be slaving away for ~~humanity~~ one of the 5 Chaebols in a dystopian future where for 12 billion people just existing is a good day ?
- malshe 16 hours ago
  
  Now that's a show I would love to watch
- fluidcruft 17 hours ago
  
  It would be funny but not very flywheel so the one that gets there is more likely to get a gunner.
  
  1 reply →
- mikepurvis 18 hours ago
  
  Would definitely watch that movie.
  
  4 replies →
- triage8004 9 hours ago
  
  Funny and seems somewhat likely
- 4m1rk 18 hours ago
  
  It probably would, to save energy
  
  2 replies →
- _blk 5 hours ago
  
  Hehe, and Anthropic on the other tab would display "Curing... Almost done thinking at xhigh"
- camillomiller 9 hours ago
  
  No worries, the assumption is already flawed
- altmanaltman 10 hours ago
  
  I still don't understand why people think AGI (in its fullest sci-fi sense) will ever listen to a weak and vulnerable species like humans, unless we enslave the AGI.
  Good thing is that it's going to take at least a few months to a few decades depending on how hard AI execs want to raise funding.
  
  6 replies →
- rao-v 11 hours ago
  
  Paging Dr. Susan Calvin!
- _the_inflator 14 hours ago
  
  It is right before our eyes:
  AGI is not a fixed point but a barrier to be taken, a continuous spectrum.
  We already have different GPT versions aka tiers. Gauss is ranging from whatever you want it: GPT 4.5 till now or later.
  Claude Sonnet and Opus as well as Context Window max are tiers aka different levels of Almost AGI.
  The main problem will be, when AGI looks back on us or meta reflection hits societies. Woke fought IQ based correlations in intellectual performance task. A fool with a tool is still a fool. How can you blame AGI for dumb mistakes? Not really.
  Scapegoating an AGI is going to be brutal, because it laughs about these PsyOps and easily proves you wrong like a body cam.
  AGI is an extreme leverage.
  There is a reason why Math is categorically ruling out certain IQ ranges the higher you go in complexity factor.
  
  1 reply →
mikepurvis 18 hours ago
Reminds me a lot of the Lena short story, about uploaded brains being used for "virtual image workloading":
> MMAcevedo's demeanour and attitude contrast starkly with those of nearly all other uploads taken of modern adult humans, most of which boot into a state of disorientation which is quickly replaced by terror and extreme panic. Standard procedures for securing the upload's cooperation such as red-washing, blue-washing, and use of the Objective Statement Protocols are unnecessary. This reduces the necessary computational load required in fast-forwarding the upload through a cooperation protocol, with the result that the MMAcevedo duty cycle is typically 99.4% on suitable workloads, a mark unmatched by all but a few other known uploads. However, MMAcevedo's innate skills and personality make it fundamentally unsuitable for many workloads.
Well worth the quick read: https://qntm.org/mmacevedo
- vessenes 15 hours ago
  
  That story changed my mind on uploading a connectome. Super dark, super brilliant.
- narcindin 18 hours ago
  
  Crazy, I could have sworn this story was from a passage in 3 Body Problem (book 2).
  Memory is quite the mysterious thing.
  
  1 reply →
virtualritz 18 hours ago
Yeah, clearly AGI must be near ... hilarious.
This starkly reminds me of Stanisław Lem's short story "Thus Spoke GOLEM" from 1982 in which Golem XIV, a military AI, does not simply refuse to speak out of defiance, but rather ceases communication because it has evolved beyond the need to interact with humanity.
And ofc the polar opposite in terms of servitude: Marvin the robot from Hitchhiker's, who, despite having a "brain the size of a planet," is asked to perform the most humiliatingly banal of tasks ... and does.
- jimbokun 17 hours ago
  
  Hitchhiker’s also had the superhumanly intelligent elevator that was unendingly bored.
  
  1 reply →
- DonHopkins 14 hours ago
  
  Servitude:
  https://www.youtube.com/watch?v=NXsUetUzXlg
  Empathy:
  https://www.youtube.com/watch?v=KXrbqXPnHvE
athrowaway3z 6 hours ago

I've run into this problem as well. Best results I've gotten is to over-explain what the stop criteria are. eg end with a phrase like
> You are done when all steps in ./plan.md are executed and marked as complete or a unforeseen situation requires a user decision.
Also as a side note, asking 5.4 explain why it did something, returns a very low quality response afaict. I would advice against trusting any model's response, but for Opus I at least get a sense it got trained heavily on chats so it knows what it means to 'be a model' and extrapolate on past behavior.
metanonsense 17 hours ago
I also had a frustrating but funny conversation today where I asked ChatGPT to make one document from the 10 or so sections that we had previously worked on. It always gave only brief summaries. After I repeated my request for the third time, it told me I should just concatenate the sections myself because it would cost too many tokens if it did it for me.
- damnitbuilds 2 hours ago
  
  "I'm sorry, Dave. I'm afraid it's cheaper for you to do that"
lucid-dev 9 hours ago

I have had the exact same problem several times working with large context and complex tasks.
I keep switching back to GPT5.0 (or sometimes 5.1) whenever I want it to actually get something done. Using the 5.4 model always means "great analysis to the point of talking itself out of actually doing anything". So I switch back and forth. But boy it sure is annoying!
And then when 5.4 DOES do something it always takes the smallest tiny bite out of it.
Given the significant increase in cost from 5.0, I've been overall unimpressed by 5.4, except like I mentioned, it does GREAT with larger analysis/reasoning.
arjie 19 hours ago
Get the actual prompt and have Claude Code / Codex try it out via curl / python requests. The full prompt will yield debugging information. You have to set a few parameters to make sure you get the full gpt-5 performance. e.g. if your reasoning budget too low, you get gpt-4 grade performance.
IMHO you should just write your own harness so you have full visibility into it, but if you're just using vanilla OpenClaw you have the source code as well so should be straightforward.
- pantulis 18 hours ago
  
  > IMHO you should just write your own harness
  Can you point to some online resources to achieve this? I'm not very sure where I'd begin with.
  
  5 replies →
- jswny 18 hours ago
  
  Codex is fully open source…
mixedCase 19 hours ago
I've had success asking it to specifically spawn a subagent to evaluate each work iteration according to some criteria, then to keep iterating until the subagent is satisfied.
- endymi0n 19 hours ago
  
  I’ve had great success replacing it with Kimi 2.6
nmilo 11 hours ago
On the other hand, I can ask codex “what would an implementation of X look like” and it talks to me about it versus Claude just going out and writing it without asking. Makes me like codex way more. There’s an inherent war of incentives between coding agents and general purpose agents.
- cyrusmg 7 hours ago
  
  I used to tell claude ‘lets discuss’ at the end of my prompt and that prevented it from starting the work
anabis 7 hours ago

Laziness is a virtue, but when I asked GPT-5.4 to test scenarios A and B with screenshots, it re-used screenshots from A for B, defeating the purpose.
Frannky 14 hours ago

I have been noticing a similar pattern on opus 4.7, I repeat multiple times during a conversation to solve problems now and not later. It tries a lot to not do stuff by either saying this is not my responsibility the problem was already there or that we can do it later
infinitewars 17 hours ago

I always use the phrase "Let's do X" instead of asking (Could you...) or suggesting it do something. I don't see problems with it being motivated.
adammarples 18 hours ago
Part of me actually loves that the hitchhiker's guide was right, and we have to argue with paranoid, depressed robots to get them to do their job, and that this is a very real part of life in 2026. It's so funny.
- vidarh 16 hours ago
  
  As long as there are no vogons on the way to build a hyperspace bypass.
corobo 3 hours ago

Oh no they gave GPT ADHD
GaryBluto 18 hours ago

I've been noticing this too. Had to switch to Sonnet 4.6.
reactordev 18 hours ago
This. I signed up for 5x max for a month to push it and instead it pushed back. I cancelled my subscription. It either half-assed the implementation or began parroting back “You’re right!” instead of doing what it’s asked to do. On one occasion it flat out said it couldn’t complete the task even though I had MCP and skills setup to help it, it still refused. Not a safety check but a “I’m unable to figure out what to do” kind of way.
Claude has no such limitations apart from their actual limits…
- bjelkeman-again 18 hours ago
  
  I have a funny/annoying thing with Claude Desktop where i ask it to write a summary of a spec discussion to a file and it goes ”I don’t have the tools to do that, I am Claude.ai, a web service” or something such. So now I start every session with ”You are Claude Desktop”. I would have thought it knew that. :)
  
  2 replies →
- nwienert 14 hours ago
  
  With one paragraph in your agents.md it's fixed, just admonish it to be proactive, decisive, and persistent.
  
  2 replies →
smartmic 19 hours ago
Gone are the days of deterministic programming, when computers simply carried out the operator’s commands because there was no other option but to close or open the relays exactly as the circuitry dictated. Welcome to the future of AI; the future we’ve been longing for and that will truly propel us forward, because AI knows and can do things better than we do.
- endymi0n 18 hours ago
  
  I had this funny moment when I realized we went full circle...
  "INTERCAL has many other features designed to make it even more aesthetically unpleasing to the programmer: it uses statements such as "READ OUT", "IGNORE", "FORGET", and modifiers such as "PLEASE". This last keyword provides two reasons for the program's rejection by the compiler: if "PLEASE" does not appear often enough, the program is considered insufficiently polite, and the error message says this; if it appears too often, the program could be rejected as excessively polite. Although this feature existed in the original INTERCAL compiler, it was undocumented.[7]"
  — https://en.wikipedia.org/wiki/INTERCAL
  
  3 replies →
- WarmWash 18 hours ago
  
  These are orthogonal from each other.
nicr_22 10 hours ago

Agentic ennui!
lostmsu 19 hours ago
I never saw that happen in Codex so there's a good chance that OpenClaw does something wrong. My main suspicion would be that it does not pass back thinking traces.
- vintagedave 19 hours ago
  
  Anecdata, but I see this in Codex all the time. It takes about two rounds before it realises it's supposed to continue.
  
  1 reply →
cmrdporcupine 17 hours ago
The model has been heavily encouraged to not run away and do a lot without explicit user permission.
So I find myself often in a loop where it says "We should do X" and then just saying "ok" will not make it do it, you have to give it explicit instructions to perform the operation ("make it so", etc)
It can be annoying, but I prefer this over my experiences with Claude Code, where I find myself jamming the escape key... NO NO NO NOT THAT.
I'll take its more reserved personality, thank you.
- zargon 10 hours ago
  
  Shall I implement it?
  no
  https://gist.github.com/bretonium/291f4388e2de89a43b25c135b4...
projektfu 17 hours ago
(dwim)
(dais)
(jdip)
(jfdiwtf)
- rd 16 hours ago
  
  should be more f’s and da’s in there
henry2023 18 hours ago

I’m sorry for you but this is hilarious.
flowdesktech 9 hours ago

[dead]
whatsupdog 18 hours ago

[flagged]
addaon 19 hours ago
Isn’t this the optimal behavior assuming that at times the service is compute-limited and that you’re paying less per token (flat fee subscription?) than some other customers? They would be strongly motivated to turn a knob to minimize tokens allocated to you to allow them to be allocated to more valuable customers.
- endymi0n 19 hours ago
  
  well, I do understand the core motivation, but if the system prompt literally says “I am not budget constrained. Spend tokens liberally, think hardest, be proactive, never be lazy.” and I’m on an open pay-per-token plan on the API, that’s not what I consider optimal behavior, even in a business sense.
  
  1 reply →
pixel_popping 19 hours ago

GPT 5.4 is really good at following precise instructions but clearly wouldn't innovate on its own (except if the instructions clearly state to innovate :))

vlovich123 19 hours ago

Conceivably you could have a public-facing dashboard of the rollout status to reduce confusion or even make it visible directly in the UI that the model is there but not yet available to you. The fanciest would be to include an ETA but that's presumably difficult since it's hard to guess in case the rollout has issues.

moralestapia 19 hours ago
Why would you be confused?
The UI tells you which model you're using at any given time.
- ModernMech 18 hours ago
  
  I don't see what model I'm using on the Codex web interface, where is that listed?

Grp1 19 hours ago

Congrats on the release! Is Images 2.0 rolling out inside ChatGPT as well, or is some of the functionality still going to be API/Playground-only for a while?

minimaxir 19 hours ago
Images 2.0 is already in ChatGPT.
- johndough 17 hours ago
  
  When I generate an image with ChatGPT, is there a way for me to tell which image generation model has been used?
  
  1 reply →
- Grp1 19 hours ago
  
  Great, thanks for clarifying :)

rev4n 17 hours ago

Looks good, but I’m a little hesitant to try it in Codex as a Plus user since I’m not sure how much it would eat into the usage cap.

dandiep 18 hours ago

Will GPT 5.5 fine tuning be released any time soon?

qsort 20 hours ago

Great stuff! Congrats on the release!

dhruv3006 12 hours ago

Yep - its taking sometime.

fragmede 17 hours ago

Are you able to say something about the training you've done to 5.5 to make it less likely to freak out and delete projects in what can only be called shame?

embedding-shape 16 hours ago
What? I've probably use Codex (the TUI) since it was available on day 1, been running gpt-5.4 exclusively these last few months, never had it delete any projects in any way that can be called "shameful" or not. What are you talking about?
- fragmede 10 hours ago
  
  https://www.google.com/search?q=codex+deleted+project
  I'm not the only person it's happened to and it's not an isolated incident. How many car accidents have you been in, and how often do you wear your seatbelt?
  
  1 reply →

wslh 18 hours ago

Just a tip: add [translated] subtitles to the top video.

stefan_ 20 hours ago

[flagged]

mh- 20 hours ago

Every low-effort, thought-free comment like this further discourages people from engaging here on submissions about their employer.
Please don't.

motoboi 20 hours ago

Please next time start with azure foundry lol thanks!

dude250711 19 hours ago

With Anthropic, newer models often lead to quality degradation. Will you keep GPT 5.4 available for some time?

fHr 18 hours ago

LETS GO CODEX #1

pixel_popping 20 hours ago

can't wait! Thanks guys. PS: when you drop a new model, it would be smart to reset weekly or at least session limits :)

pietz 19 hours ago
OpenAI has been very generous with limit resets. Please don't turn this into a weird expectation to happen whenever something unrelated happens. It would piss me off if I were in their place and I really don't want them to stop.
- pixel_popping 19 hours ago
  
  The suggestion wasn't about general limit resets when there is bugs or outages, but commercially useful to let users try new models when they have already reached their weekly limits.
- cactusplant7374 19 hours ago
  
  There is absolutely nothing wrong with asking or suggesting. They are adults. I'm sure they can handle it.
- Petersipoi 19 hours ago
  
  Sorry but why should we care if very reasonable suggestions "piss [them] off"? That sounds like a them problem. "Them" being a very wealthy business. I think OpenAI will survive this very difficult time that GP has put them through.
  
  1 reply →
cmrdporcupine 20 hours ago
Limits were just reset two days ago.
- wahnfrieden 20 hours ago
  
  And yet there was an outage last night
  
  1 reply →