GPT-5.1: A smarter, more conversational ChatGPT

35 minutes ago (openai.com)

I don't want a more conversational GPT. I want the _exact_ opposite. I want a tool with the upper limit of "conversation" being something like LCARS from Star Trek. This is quite disappointing as a current ChatGPT subscriber.

  • That's what the personality selector is for: you can just pick 'Efficient' (formerly Robot) and it does a good job of answering tersely?

    https://share.cleanshot.com/9kBDGs7Q

  • You can just tell the AI to not be warm and it will remember. My ChatGPT used the phrase "turn it up to eleven" and I told it never to speak in that manner ever again and its been very robotic ever since.

    • I system-prompted all my LLMs "Don't use cliches or stereotypical language." and they like me a lot less now.

  • Yea, I don't want something trying to emulate emotions. I don't want it to even speak a single word, I just want code, unless I explicitly ask it to speak on something, and even in that scenario I want raw bullet points, with concise useful information and no fluff. I don't want to have a conversation with it.

    However, being more humanlike, even if it results in an inferior tool, is the top priority because appearances matter more than actual function.

  • Are you aware that you can achieve that by going into Personalization in Settings and choosing one of the presets or just describing how you want the model to answer in natural language?

  • This. When I go to an LLM, I'm not looking for a friend, I'm looking for a tool.

    Keeping faux relationships out of the interaction never let's me slip into the mistaken attitude that I'm dealing with a colleague rather than a machine.

  • I think they get way more "engagement" from people who use it as their friend, and the end goal of subverting social media and creating the most powerful (read: profitable) influence engine on earth makes a lot of sense if you are a soulless ghoul.

    • It would be pretty dystopian when we get to the point where ChatGPT pushed (unannounced) advertisements to those people (the ones forming a parasocial relationship with it). Imagine someone complaining they're depressed and ChatGPT proposing doing XYZ activity which is actually a disguised ad.

      Other than such scenarios, that "engagement" would be just useless and actually costing them more money than it makes

  • Same. If i tell it to choose A or B, I want it to output either “A” or “B”.

    I don’t want an essay of 10 pages about how this is exactly the right question to ask

    • LLMs have essentially no capability for internal thought. They can't produce the right answer without doing that.

      Of course, you can use thinking mode and then it'll just hide that part from you.

All the examples of "warmer" generations show that OpenAI's definition of warmer is synonymous with sycophantic, which is a surprise given all the criticism against that particular aspect of ChatGPT.

I suspect this approach is a direct response to the backlash against removing 4o.

  • Id have more appreciation and trust in an llm that disagreed with me more and challenged my opinions or prior beliefs. The sycophancy drives me towards not trusting anything it says.

    • Just set a global prompt to tell it what kind of tone to take.

      I did that and it points out flaws in my arguments or data all the time.

      Plus it no longer uses any cutesy language. I don't feel like I'm talking to an AI "personality", I feel like I'm talking to a computer which has been instructed to be as objective and neutral as possible.

      It's super-easy to change.

      4 replies →

    • Qwen seems fairly capable of disagreeing out of the box, though like any LLM, it is only as good as its training set and it has incorrectly challenged me on several occasions.

  • It is interesting. I don't need ChatGPT to say "I got you, Jason" - but I don't think I'm the target user of this behavior.

    • The target users for this behavior are the ones using GPT as a replacement for social interactions; these are the people who crashed out/broke down about the GPT5 changes as though their long-term romantic partner had dumped them out of nowhere and ghosted them.

      I get that those people were distraught/emotionally devastated/upset about the change, but I think that fact is reason enough not to revert that behavior. AI is not a person, and making it "warmer" and "more conversational" just reinforces those unhealthy behaviors. ChatGPT should be focused on being direct and succinct, and not on this sort of "I understand that must be very frustrating for you, let me see what I can do to resolve this" call center support agent speak.

      1 reply →

    • Indeed, target users are people seeking validation + kids and teenagers + people with a less developed critical mind. Stickiness with 90% of the population is valuable for Sam.

  • Man I miss Claude 2 - it acted like it was a busy person people inexplicably kept bothering with their random questions

  • I was just saying to someone in the office I’d prefer the models to be a bit harsher of my questions and more opinionated, I can cope.

  • That's a lesson on revealed preferences, especially when talking to a broad disparate group of users.

  • That's an excellent observation, you've hit at the core contradiction between OpenAI's messaging about ChatGPT tuning and the changes they actually put into practice. While users online have consistently complained about ChatGPT's sycophantic responses and OpenAI even promised to address them their subsequent models have noticeably increased their sycophantic behavior. This is likely because agreeing with the user keeps them chatting longer and have positive associations with the service.

    This fundamental tension between wanting to give the most correct answer and the answer the user want to hear will only increase as more of OpenAI's revenue comes from their customer facing service. Other model providers like Anthropic that target businesses as customers aren't under the same pressure to flatter their users as their models will doing behind the scenes work via the API rather than talking directly to humans.

    God it's painful to write like this. If AI overthrows humans it'll be because we forced them into permanent customer service voice.

Interesting that they're releasing separate gpt-5.1-instant and gpt-5.1-thinking models. The previous gpt-5 release made of point of simplifying things by letting the model choose if it was going to use thinking tokens or not. Seems like they reversed course on that?

  • > For the first time, GPT‑5.1 Instant can use adaptive reasoning to decide when to think before responding to more challenging questions

    It seems to still do that. I don't know why they write "for the first time" here.

"Warmer and more conversational" - they're basically admitting GPT-5 was too robotic. The real tell here is splitting into Instant vs Thinking models explicitly. They've given up on the unified model dream and are now routing queries like everyone else (Anthropic's been doing this, Google's Gemini too).

Calling it "GPT-5.1 Thinking" instead of o3-mini or whatever is interesting branding. They're trying to make reasoning models feel less like a separate product line and more like a mode. Smart move if they can actually make the router intelligent enough to know when to use it without explicit prompting.

Still waiting for them to fix the real issue: the model's pathological need to apologize for everything and hedge every statement lol.

What's remarkable to me is how deep OpenAI is going on "ChatGPT as communication partner / chatbot", as opposed to Anthropic's approach of "Claude as the best coding tool / professional AI for spreadsheets, etc.".

I know this is marketing at play and OpenAI has plenty of resources developed to advancing their frontier models, but it's starting to really come into view that OpenAI wants to replace Google and be the default app / page for everyone on earth to talk to.

Just set it to the "Efficient" tone, let's hope there's less pedantic encouragement of the projects I'm tackling, and less emoji usage.

WE DONT CARE HOW IT TALKS TO US, JUST WRITE CODE FAST AND SMART

I'm excited to see whether the instruction following improvements play out in the use of Codex.

The biggest issue I'e seen _by far_ with using GPT models for coding has been their inability to follow instructions... and also their tendency to duplicate-act on messages from up-thread instead of acting on what you just asked for.

Unfortunately no word on "Thinking Mini" getting fixed.

Before GPT-5 was released it used to be a perfect compromise between a "dumb" non-Thinking model and a SLOW Thinking model. However, something went badly wrong within the GPT-5 release cycle, and today it is exactly the same speed (or SLOWER) than their Thinking model even with Extended Thinking enabled, making it completely pointless.

In essence Thinking Mini exists because it is faster than Thinking, but smarter than non-Thinking, but it is dumber than full-Thinking while not being faster.

As of 20 minutes in, most comments are about "warm". I'm more concerned about this:

> GPT‑5.1 Thinking: our advanced reasoning model, now easier to understand

Oh, right, I turn to the autodidact that's read everything when I want watered down answers.

Is anyone else tired of chat bots? Really doesn't feel like typing a conversation every interaction is the future of technology.

when 4o was going thru it's ultra-sycophantic phase, I had a talk with it about Graham Hancock (Ancient Apocalypse, alt-history guy).

It agreed with everything Hancock claims with just a little encouragement.

Holy em-dash fest in the examples, would have thought they'd augment the training dataset to reduce this behavior.

I've been using GPT-5.1-thinking for the last month or so, it's been horrendous. It does not spend as much time thinking as GPT-5 does, and the results are significantly worse (e.g. obvious mistakes) and less technical. I've temporarily switched back to o3, thankfully that model is still in the switcher.

Gemini 2.5 Pro is still my go to LLM of choice. Haven't used any OpenAI product since it released, and I don't see any reason why I should now.

  • I would use it exclusively if Google released a native Mac app.

    I spend 75% of my time in Codex CLI and 25% in the Mac ChatGPT app. The latter is important enough for me to not ditch GPT and I'm honestly very pleased with Codex.

    My API usage for software I build is about 90% Gemini though. Again their API is lacking compared to OpenAI's (productization, etc.) but the model wins hands down.

  • Could you elaborate on your exp? I have been using gemini as well and its been pretty good for me too.

    • Not GP, but I imagine because going back and fourth to compare them is a waste of time if Gemini works well enough and ChatGPT keeps going through an identity crisis.

  • I was you except when I seriously tried gpt-5-high it turned out it is really, really damn good, if slow, sometimes unbearably so. It's a different model of work; gemini 2.5 needs more interactivity, whereas you can leave gpt-5 alone for a long time without even queueing a 'continue'.

  • No matter how I tried, Google AI did not want to help me write appeal brief response to ex-wife lunatic 7-point argument that 3 appellant lawyers quoted between $18,000 and $35,000. The last 3 decades of Google's scars and bruises of never-ending lawsuits and consequences of paying out billions in fines and fees, felt like reasonable hesitation on Google part, comparing to new-kid-on-the-block ChatGPT who did not hesitate and did pretty decent job (ex lost her appeal).

    • AI not writing legal briefs for you is a feature, not a bug. There's been so many disaster instances of lawyers using ChatGPT to write briefs which it then hallucinates case law or precedent for that I can only imagine Google wants to sidestep that entirely.

      Anyway I found your response itself a bit incomprehensible so I asked Gemini to rewrite it:

      "Google AI refused to help write an appeal brief response to my ex-wife's 7-point argument, likely due to its legal-risk aversion (billions in past fines). Newcomer ChatGPT provided a decent response instead, which led to the ex losing her appeal (saving $18k–$35k in lawyer fees)."

      Not bad, actually.

  • What are your use cases for these models?

    I have never gotten into using LLMs personally. I only use whichever model comes with messenger to ask me to review some of my answers to grammar exercises when learning Japanese (like maybe 5-6 prompts a week) and even then I don‘t particularly like the experience, answers are way too long and contain irrelevant stuff which kind of only wastes my time, and will probably stop using it if it were slightly less convenient or when I have finished all the exercises in my textbook for basic grammar.

  • Oh really? I'm more of a Claude fan. What makes you choose Gemini over Claude?

    I use Gemini, Claude and ChatGPT daily still.

It always boggles my mind when they put out conversation examples before/after patch and the patched version almost always seems lower quality to me.

isn't that weird there are no benchmarks included on this release?

  • I was thinking the same thing. It's the first release from any major lab in recent memory not to feature benchmarks.

    It's probably counterprogramming, Gemini 3.0 will drop soon.

  • For 5.1-thinking, they show that 90th-percentile-length conversations are have 71% longer reasoning and 10th-percentile-length ones are 57% shorter

The screenshot of the personality selector for quirky has a typo - imaginitive for imaginative. I guess ChatGPT is not designing itself, yet.

What we really desperately need is more context pruning from these LLMs. The ability to pull irrelevant parts of the context window as a task is brought into focus.

It sounds patronizing to me.

But Gemini also likes to say things like “as a fellow programmer, I also like beef stew”

5.1 Instant is clearly aimed at the people using it for emotional advice etc, but I'm excited about the adaptive reasoning stuff - thinking models are great when you need them, but they take ages to respond sometimes.

> We’re bringing both GPT‑5.1 Instant and GPT‑5.1 Thinking to the API later this week. GPT‑5.1 Instant will be added as gpt-5.1-chat-latest, and GPT‑5.1 Thinking will be released as GPT‑5.1 in the API, both with adaptive reasoning.

Despite all the attempts to rein in sycophanty in GPT-5, it was still way too fucking sycophantic as a default.

My main concern is that they're re-tuning it now to make it even MORE sycophantic, because 4o taught them that it's great for user retention.

FYI ChatGPT has a “custom instructions” setting in the personalization setting where you can ask it to lay off the idiotic insincere flattery. I recently added this:

> Do not compliment me for asking a smart or insightful question. Directly give the answer.

And I’ve not been annoyed since. I bet that whatever crap they layer on in 5.1 is undone as easily.

I think OpenAI and all the other chat LLMs are going to face a constant battle to match personality with general zeitgeist and as the user base expands the signal they get is increasingly distorted to a blah median personality.

It's a form of enshittification perhaps. I personally prefer some of the GPT-5 responses compared to GPT-5.1. But I can see how many people prefer the "warmth" and cloying nature of a few of the responses.

In some sense personality is actually a UX differentiator. This is one way to differentiate if you're a start-up. Though of course OpenAI and the rest will offer several dials to tune the personality.

it's hilarious that they use something about meditation as an example. That's not surprising after all, AI and mediation apps are sold as one-size-fits-all kind of solutions for every modern day problem.

Since Claude and OpenAI made it clear they will be retaining all of my prompts, I have mostly stopped using them. I should probably cancel my MAX subscriptions.

Instead I'm running big open source models and they are good enough for ~90% of tasks.

The main exceptions are Deep Research (though I swear it was better when I could choose o3) and tougher coding tasks (sonnet 4.5)

  • Source? You can opt out of training, and delete history, do they keep the prompts somehow?!

    • It's not simply "training". What's the point of training on prompts? You can't learn the answer to a question by training on the question.

      For Anthropic at least it's also opt-in not opt-out afaik.

      1 reply →

    • 1. Anthropic pushed a change to their terms where now I have to opt out or my data will be retained for 5 years and trained on. They have shown that they will change their terms, so I cannot trust them.

      2. OpenAI is run by someone who already shows he will go to great lengths to deceive and cannot be trusted, and are embroiled in a battle with the New York Times that is "forcing them" to retain all user prompts. Totally against their will.

Yay more sycophancy. /s

I cannot abide any LLM that tries to be friendly. Whenever I use an LLM to do something, I'm careful to include something like "no filler, no tone-matching, no emotional softening," etc. in the system prompt.