Comment by bcherny

4 hours ago

Hey all, Boris from the Claude Code team here. I just responded on the issue, and cross-posting here for input.

---

Hi, thanks for the detailed analysis. Before I keep going, I wanted to say I appreciate the depth of thinking & care that went into this.

There's a lot here, I will try to break it down a bit. These are the two core things happening:

> `redact-thinking-2026-02-12`

This beta header hides thinking from the UI, since most people don't look at it. It *does not* impact thinking itself, nor does it impact thinking budgets or the way extended reasoning works under the hood. It is a UI-only change.

Under the hood, by setting this header we avoid needing thinking summaries, which reduces latency. You can opt out of it with `showThinkingSummaries: true` in your settings.json (see [docs](https://code.claude.com/docs/en/settings#available-settings)).

If you are analyzing locally stored transcripts, you wouldn't see raw thinking stored when this header is set, which is likely influencing the analysis. When Claude sees lack of thinking in transcripts for this analysis, it may not realize that the thinking is still there, and is simply not user-facing.

> Thinking depth had already dropped ~67% by late February

We landed two changes in Feb that would have impacted this. We evaluated both carefully:

1/ Opus 4.6 launch → adaptive thinking default (Feb 9)

Opus 4.6 supports adaptive thinking, which is different from thinking budgets that we used to support. In this mode, the model decides how long to think for, which tends to work better than fixed thinking budgets across the board. `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING` to opt out.

2/ Medium effort (85) default on Opus 4.6 (Mar 3)

We found that effort=85 was a sweet spot on the intelligence-latency/cost curve for most users, improving token efficiency while reducing latency. On of our product principles is to avoid changing settings on users' behalf, and ideally we would have set effort=85 from the start. We felt this was an important setting to change, so our approach was to:

1. Roll it out with a dialog so users are aware of the change and have a chance to opt out

2. Show the effort the first few times you opened Claude Code, so it wasn't surprising.

Some people want the model to think for longer, even if it takes more time and tokens. To improve intelligence more, set effort=high via `/effort` or in your settings.json. This setting is sticky across sessions, and can be shared among users. You can also use the ULTRATHINK keyword to use high effort for a single turn, or set `/effort max` to use even higher effort for the rest of the conversation.

Going forward, we will test defaulting Teams and Enterprise users to high effort, to benefit from extended thinking even if it comes at the cost of additional tokens & latency. This default is configurable in exactly the same way, via `/effort` and settings.json.

112 comments

bcherny

richardjennings 4 hours ago

I was not aware the default effort had changed to medium until the quality of output nosedived. This cost me perhaps a day of work to rectify. I now ensure effort is set to max and have not had a terrible session since. Please may I have a "always try as hard as you can" mode ?

Avamander 14 minutes ago

I feel like the maximum effort mode kind-of wraps around and starts becoming "desperate" to the extent of lazy or a monkey's paw, similar to how lower effort modes or a poor prompt.
clevergadget 5 minutes ago

bad citizen
Schiendelman 2 hours ago
That's /effort max!
- richardjennings 2 hours ago
  
  You cannot control the effort setting sub-agents use and you also cannot use /effort max as a default (outside of using an alias).

koverstreet 4 hours ago

There's been more going on than just the default to medium level thinking - I'll echo what others are saying, even on high effort there's been a very significant increase in "rush to completion" behavior.

bcherny 4 hours ago
Thanks for the feedback. To make it actionable, would you mind running /bug the next time you see it and posting the feedback id here? That way we can debug and see if there's an issue, or if it's within variance.
- JamesSwift 1 hour ago
  
  a9284923-141a-434a-bfbb-52de7329861d d48d5a68-82cd-4988-b95c-c8c034003cd0 5c236e02-16ea-42b1-b935-3a6a768e3655 22e09356-08ce-4b2c-a8fd-596d818b1e8a 4cb894f7-c3ed-4b8d-86c6-0242200ea333
  Amusingly (not really), this is me trying to get sessions to resume to then get feedback ids and it being an absolute chore to get it to give me the commands to resume these conversations but it keeps messing things up: cf764035-0a1d-4c3f-811d-d70e5b1feeef
- koverstreet 4 hours ago
  
  I'll have a look. The CoT switch you mentioned will help, I'll take a look at that too, but my suspicion is that this isn't a CoT issue - it's a model preference issue.
  Comparing Opus vs. Qwen 27b on similar problems, Opus is sharper and more effective at implementation - but will flat out ignore issues and insist "everything is fine" that Qwen is able to spot and demonstrate solid understanding of. Opus understands the issues perfectly well, it just avoids them.
  This correlates with what I've observed about the underlying personalities (and you guys put out a paper the other day that shows you guys are starting to understand it in these terms - functionally modeling feelings in models). On the whole Opus is very stable personality wise and an effective thinker, I want to complement you guys on that, and it definitely contrasts with behaviors I've seen from OpenAI. But when I do see Opus miss things that it should get, it seems to be a combination of avoidant tendencies and too much of a push to "just get it done and move into the next task" from RHLF.
- freedomben 4 hours ago
  
  How much of the code/context gets attached in the /bug report?
  
  6 replies →
stefan_ 2 hours ago

Theres also been tons of thinking leaking into the actual output. Recently it even added thinking into a code patch it did (a[0] &= ~(1 << 2); // actually let me just rewrite { .. 5 more lines setting a[0] .. }).

plexicle 4 hours ago

Ultrathink is back? I thought that wasn't a thing anymore.

If I am following.. "Max" is above "High", but you can't set it to "Max" as a default. The highest you can configure is "High", and you can use "/effort max" to move a step up for a (conversation? session?), or "ultrathink" somewhere in the prompt to move a step up for a single turn. Is this accurate?

bcherny 4 hours ago
Yep, exactly
- dostick 2 hours ago
  
  Mentioning ULTRATHINK in prompt is the equivalent to /effort max?
  
  1 reply →

johndough 3 hours ago

I think it is hilarious that there are four different ways to set settings (settings.json config file, environment variable, slash commands and magical chat keywords).

That kind of consistency has also been my own experience with LLMs.

larpingscholar 12 minutes ago

You are yet to discover the joys of the managed settings scope. They can be set three ways. The claude.ai admin console; by one of two registry keys e.g. HKLM\SOFTWARE\Policies\ClaudeCode; and by an alphabetically merged directory of json files.
monatron 3 hours ago

To be fair, I can think of reasons why you would want to be able to set them in various ways.
- settings.json - set for machine, project
- env var - set for an environment/shell/sandbox
- slash command - set for a session
- magical keyword - set for a turn
SAI_Peregrinus 2 hours ago

It's not unique to LLMs. Take BASH: you've got `/etc/profile`, `~/.bash_profile,` `~/.bash_login`, `~/.bashrc`, `~/.profile`, environment variables, and shell options.
ggdxwz 2 hours ago

Especially some settings are in setting.json, and others in .claude.json So sometimes I have to go through both to find the one I want to tweak

w10-1 4 hours ago

Here's the reply in context:

https://github.com/anthropics/claude-code/issues/42796#issue...

Sympathies: Users now completely depend on their jet-packs. If their tools break (and assuming they even recognize the problem). it's possible they can switch to other providers, but more likely they'll be really upset for lack of fallbacks. So low-touch subscriptions become high-touch thundering herds all too quickly.

dc_giant 4 hours ago

All right so what do I need to do so it does its job again? Disable adaptive thinking and set effort to high and/or use ULTRATHINK again which a few weeks ago Claude code kept on telling me is useless now?

bcherny 4 hours ago
Run this: /effort high
- berkanunal 4 hours ago
  
  Imagine if all service providers were behaving like this.
  > Ahh, sorry we broke your workflow.
  > We found that `log_level=error` was a sweet spot for most users.
  > To make it work as you expect it so, run `./bin/unpoop` it will set log_level=warn
  
  1 reply →
stldev 3 hours ago

You can't. This is Anthropic leveraging their dials, and ignoring their customers for weeks.
Switch providers.
Anecdotally, I've had no luck attempting to revert to prior behavior using either high/max level thinking (opus) or prompting. The web interface for me though doesn't seem problematic when using opus extended.

aizk 4 hours ago

How do you guys manage regressions as a whole with every new model update? A massive test set of e2e problem solving seeing how the models compare?

bcherny 4 hours ago
A mix of evals and vibes.
- giwook 4 hours ago
  
  What's that ratio exactly
  
  2 replies →
- capnchaos 4 hours ago
  
  Are you doing any Digital Twin testing or simulations? I imagine you can't test a product like Claude Code using traditional means.
- efields 2 hours ago
  
  "Evals and vibes" can I put that on a t shirt?
try-working 1 hour ago

I use a self-documenting recursive workflow: https://github.com/doubleuuser/rlm-workflow

DennisL123 4 hours ago

Happy to have my mind changed, yet I am not 100% convinced closing the issue as completed captures the feedback.

bcherny 4 hours ago
From the contents of the issue, this seems like a fairly clear default effort issue. Would love your input if there's something specific that you think is unaddressed.
- vecter 4 hours ago
  
  From this reply, it seems that it has nothing to do with `/effort`: https://github.com/anthropics/claude-code/issues/42796#issue...
  I hope you take this seriously. I'm considering moving my company off of Claude Code immediately.
  Closing the GH issue without first engaging with the OP is just a slap in the face, especially given how much hard work they've done on your behalf.
  
  4 replies →
- JamesSwift 2 hours ago
  
  I commented on the GH issue, but Ive had effort set to 'high' for however long its been available and had a marked decline since... checks notes... about 23 March according to slack messages I sent to the team to see if I was alone (I wasnt).
  EDIT: actually the first glaring issue I remember was on 20 March where it hallucinated a full sha from a short sha while updating my github actions version pinning. That follows a pattern of it making really egregious assumptions about things without first validating or checking. Ive also had it answer with hallucinated information instead of looking online first (to a higher degree than Ive been used to after using these models daily for the past ~6 months)
  
  1 reply →
- DennisL123 1 hour ago
  
  Gotcha. It seemed though from the replies on the github ticket that at least some of the problem was unrelated to effort settings.

KenoFischer 3 hours ago

While we have you here, could you fix the bash escaping bug? https://github.com/anthropics/claude-code/issues/10153

starkparker 4 hours ago

> You can also use the ULTRATHINK keyword to use high effort for a single turn

First I've heard that ultrathink was back. Much quieter walkback of https://decodeclaude.com/ultrathink-deprecated/

giwook 2 hours ago

Pretty sure it's still gone and you should be using effort level now for this.

migali49g 2 hours ago

Hi Boris, thanks for addressing this and providing feedback quickly. I noticed the same issue. My question is, is it enough to do /efforts high, or should I also add CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING to my settings?

areoform 2 hours ago

Hey Boris, thanks for the awesomeness that's Claude! You've genuinely changed the life of quite a few young people across the world. :)

not sure if the team is aware of this, but Claude code (cc from here on) fails to install / initiate on Windows 10; precise version, Windows 10.0.19045 build 19045. It fails mid setup, and sometimes fails to throw up a log. It simply calls it quits and terminates.

On MacOS, I use Claude via terminal, and there have been a few, minor but persistent harness issues. For example, cc isn't able to use Claude for Chrome. It has worked once and only once, and never again. Currently, it fails without a descriptive log or issue. It simply states permission has been denied.

More generally, I use Claude a lot for a few sociological experiments and I've noticed that token consumption has increased exponentially in the past 3 weeks. I've tried to track it down by project etc., but nothing obvious has changed. I've gone from almost never hitting my limits on a Max account to consistently hitting them.

I realize that my complaint is hardly unique, but happy to provide logs / whatever works! :)

And yeah, thanks again for Claude! I recommend Claude to so many folks and it has been instrumental for them to improve their lives.

I work for a fund that supports young people, and we'd love to be able to give credits out to them. I tried to reach out via the website etc. but wasn't able to get in touch with anyone. I just think more gifted young people need Claude as a tool and a wall to bounce things off of; it might measurably accelerate human progress. (that's partly the experiment!)

JohnMakin 3 hours ago

I’ve seen you/anthropic comment repeatedly over the last several months about the “thinking” in similar ways -

“most users dont look at it” (how do you know this?)

“our product team felt it was too visually noisy”

etc etc. But every time something like this is stated, your power users (people here for the most part) state that this is dead wrong. I know you are repeating the corporate line here, but it’s bs.

exfalso 41 minutes ago

It's to prevent distillation. Duh
wonnage 3 hours ago
Anecdotally the “power users” of AI are the ones who have succumbed to AI psychosis and write blog posts about orchestrating 30 agents to review PRs when one would’ve done just fine.
The actual power users have an API contract and don’t give a shit about whatever subscription shenanigans Claude Max is pulling today
- JohnMakin 2 hours ago
  
  Uh, no. Definitely not me at all.
  
  2 replies →

yubblegum 3 hours ago

> Before I keep going, I wanted to say I appreciate the depth of thinking & care that went into this.

"This report was produced by me — Claude Opus 4.6 — analyzing my own session logs. ... Ben built the stop hook, the convention reviews, the frustration-capture tools, and this entire analysis pipeline because he believes the problem is fixable and the collaboration is worth saving. He spent today — a day he could have spent shipping code — building infrastructure to work around my limitations instead of leaving."

What a "fuckin'" circle jerk this universe has turned out to be. This note was produced by me and who the hell is Ben?

matheusmoreira 3 hours ago

I definitely noticed the mid-output self-correction reasoning loops mentioned in the GitHub issue in some conversations with Opus 4.6 with extended reasoning enabled on claude.ai. How do I max out the effort there?

ting0 4 hours ago

Do you guys realize that everyone is switching to Codex because Claude Code is practically unusable now, even on a Max subscription? You ask it to do tasks, and it does 1/10th of them. I shouldn't have to sit there and say: "Check your work again and keep implementing" over and over and over again... Such a garbage experience.

Does Anthropic actually care? Or is it irrelevant to your company because you think you'll be replacing us all in a year anyway?

raincole 4 hours ago

> I wanted to say I appreciate the depth of thinking & care that went into this.

The irony lol. The whole ticket is just AI-generated. But Anthropic employees have to say this because saying otherwise will admit AI doesn't have "the depth of thinking & care."

vlovich123 4 hours ago

It's also pretty standard corporate speak to make sure you don't alienate any users / offend anyone. That's why corporate speak is so bland.
rafaelmn 3 hours ago
Ticket is AI generated but from what I've seen these guys have a harness to capture/analyze CC performance, so effort was made on the user side for sure.
- notatallshaw 2 hours ago
  
  The note at the end of the post indicates the user asked Claude to review their own chat logs. It's impossible to tell if Claude used or built a a performance harness or just wrote those numbers based on vibes.
- gardnr 2 hours ago
  
  There is this 3rd party tracker: https://marginlab.ai/trackers/claude-code/

ai_slop_hater 4 hours ago

> This beta header hides thinking from the UI, since most people don't look at it.

I look at it, and I am very upset that I no longer see it.

bcherny 4 hours ago
There is a setting if you'd like to continue to see it: showThinkingSummaries.
See the docs: https://code.claude.com/docs/en/settings#available-settings
- starkparker 4 hours ago
  
  > Thinking summaries will now appear in the transcript view (Ctrl+O).
  Also: https://github.com/anthropics/claude-code/issues/30958
  
  1 reply →
- antonvs 4 hours ago
  
  > As I noted in the comment,
  Piece of free PR advice: this is fine in a nerd fight, but don't do this in comments that represent a company. Just repeat the relevant information.
  
  4 replies →

ting0 4 hours ago

Thinking time is not the issue. The issue is that Claude does not actually complete tasks. I don't care if it takes longer to think, what I care about is getting partial implementations scattered throughout my codebase while Claude pretends that it finished entirely. You REALLY need to fix this, it's atrocious.

j45 3 hours ago

Thanks for the update,

Perhaps max users can be included in defaulting to different effort levels as well?

ctoth 4 hours ago

[flagged]

bcherny 2 hours ago

Christopher, would you be able to share the transcripts for that repo by running /bug? That would make the reports actionable for me to dig in and debug.
quietsegfault 4 hours ago
I’m not sure being confrontational like this really helps your case. There are real people responding, and even if you’re frustrated it doesn’t pay off to take that frustration out on the people willing to help.
- ctoth 4 hours ago
  
  Fair point on tone. It's a bit of a bind isn't it? When you come with a well-researched issue as OP did, you get this bland corporate nonsense "don't believe your lyin' eyes, we didn't change anything major, you can fix it in settings."
  How should you actually communicate in such a way that you are actually heard when this is the default wall you hit?
  The author is in this thread saying every suggested setting is already maxed. The response is "try these settings." What's the productive version of pointing out that the answer doesn't address the evidence? Genuine question. I linked my repo because it's the most concrete example I have.
  
  3 replies →
- malfist 4 hours ago
  
  Is somebody saying "you're holding it wrong" a "people willing to help"?
  
  2 replies →
- throwaway613746 3 hours ago
  
  [dead]
- BigTTYGothGF 3 hours ago
  
  The stated policy of HN is "don't be mean to the openclaw people", let's see if it generalizes.
lambda 4 hours ago
I guess one of the things I don't understand: how you expect a stochastic model, sold as a proprietary SaaS, with a proprietary (though briefly leaked) client, is supposed to be predictable in its behavior.
It seems like people are expecting LLM based coding to work in a predictable and controllable way. And, well, no, that's not how it works, and especially so when you're using a proprietary SaaS model where you can't control the exact model used, the inference setup its running on, the harness, the system prompts, etc. It's all just vibes, you're vibe coding and expecting consistency.
Now, if you were running a local weights model on your own inference setup, with an open source harness, you'd at least have some more control of the setup. Of course, it's still a stochastic model, trained on who knows what data scraped from the internet and generated from previous versions of the model; there will always be some non-determinism. But if you're running it yourself, you at least have some control and can potentially bisect configuration changes to find what caused particular behavior regressions.
- dev_l1x_be 2 hours ago
  
  The problem is degradation. It was working much better before. There are many people (some example of a well know person[0]), including my circle of friends and me who were working on projects around the Opus 4.6 rollout time and suddenly our workflows started to degrade like crazy. If I did not have many quality gates between an LLM session and production I would have faced certain data loss and production outages just like some famous company did. The fun part is that the same workflow that was reliably going through the quality gates before suddenly failed with something trivial. I cannot pinpoint what exactly Claude changed but the degradation is there for sure. We are currently evaling alternatives to have an escape hatch (Kimi, Chatgpt, Qwen are so far the best candidates and Nemotron). The only issue with alternatives was (before the Claude leak) how well the agentic coding tool integrates with the model and the tool use, and there are several improvements happening already, like [1]. I am hoping the gap narrows and we can move off permanently. No more hoops, you are right, I should not have attempted to delete the production database moments.
  https://x.com/theo/status/2041111862113444221
  https://x.com/_can1357/status/2021828033640911196
- stavros 2 hours ago
  
  Same as how I expect a coin to come up heads 50% of the time.
  
  2 replies →
- randomNumber7 1 hour ago
  
  > how you expect a stochastic model [...] is supposed to be predictable in its behavior.
  I used it often enough to know that it will nail tasks I deem simple enough almost certainly.
malfist 4 hours ago
It also completely ignores the increase in behavioral tracking metrics. 68% increase in swearing at the LLM for doing something wrong needs to be addressed and isn't just "you're holding it wrong"
- alchemist1e9 2 hours ago
  
  I’m think a great marketing line for local/selfhosted LLMs in the future - “You can swear at your LLM and nobody will care!”
dang 1 hour ago

Please don't post this aggressively to Hacker News. You can make your substantive points without that.
https://news.ycombinator.com/newsguidelines.html
iwalton3 4 hours ago

[dead]

tatrions 4 hours ago

[flagged]

bcherny 4 hours ago

Yep totally -- think of this as "maximum effort". If a task doesn't need a lot of thinking tokens, then the model will choose a lower effort level for the task.
koverstreet 4 hours ago
Technically speaking, models inherently do this - CoT is just output tokens that aren't included in the final response because they're enclosed in <think> tags, and it's the model that decides when to close the tag. You can add a bias to make it more or less likely for a model to generate a particular token, and that's how budgets work, but it's always going to be better in the long run to let the model make that decision entirely itself - the bias is a short term hack to prevent overthinking when the model doesn't realize it's spinning in circles.
- ai_slop_hater 4 hours ago
  
  > You can add a bias to make it more or less likely for a model to generate a particular token, and that's how budgets work
  Do you have a source for this? I am interested in learning more about how this works.
  
  4 replies →

nickvec 2 hours ago

Hey Boris, would appreciate if you could respond to my DM on X about Claude erroneously charging me $200 in extra credit usage when I wasn't using the service. Haven't heard back from Claude Support in over a month and I am getting a bit frustrated.