Comment by bauerd

19 hours ago

>On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode

Instead of fixing the UI they lowered the default reasoning effort parameter from high to medium? And they "traced this back" because they "take reports about degradation very seriously"? Extremely hard to give them the benefit of doubt here.

Hey, Boris from the team here.

We did both -- we did a number of UI iterations (eg. improving thinking loading states, making it more clear how many tokens are being downloaded, etc.). But we also reduced the default effort level after evals and dogfooding. The latter was not the right decision, so we rolled it back after finding that UX iterations were insufficient (people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this).

  • Having a "Recovery Mode"/"Safe Boot" flag to disable our configurations (or progressively enable) to see how claude code responds would be nice. Sometimes I get worried some old flag I set is breaking things. Maybe the flag already exists? I tried Claude doctor but it wasn't quite the solution.

    For instance:

    Is Haiku supposed to hit a warm system-prompt cache in a default Claude code setup?

    I had `DISABLE_TELEMETRY=1` in my env and found the haiku requests would not hit a warm-cached system prompt. E.g. on first request just now w/ most recent version (v2.1.118, but happened on others):

    w/ telemetry off - input_tokens:10 cache_read:0 cache_write:28897 out:249

    w/ telemetry on - input_tokens:10 cache_read:24344 cache_write:7237 out:243

    I used to think having so many users was leading to people hitting a lot of edge cases, 3 million users is 3 million different problems. Everyone can't be on the happy path. But then I started hitting weird edge cases and started thinking the permutations might not be under control.

  • > people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this

    UI is UI. It is naive to expect that you build some UI but users will "just magically" find out that they should use it as a terminal in the first place.

  • You didn’t anticipate most people stick with defaults?

    • We anticipated the default would be the best option for most people. We were wrong, so we reverted the default.

  • “after evals and dogfooding” couldn’t have done this before releasing the model? We are paying $200/month to beta test the software for you.

Yeah, this is so silly.

Anthropic: removes thinking output

Users: see long pauses, complain

Anthropic: better reduce thinking time

Users: wtf

To me it really, really seems like Anthropic is trying to undo the transparency they always had around reasoning chains, and a lot of issues are due to that.

Removing thinking blocks from the convo after 1 hour of being inactive without any notice is just the icing on the cake, whoever thought that was a good idea? How about making “the cache is hot” vs “the cache is cold” a clear visual indicator instead, so you slowly shape user behavior, rather than doing these types of drastic things.

> Instead of fixing the UI they lowered the default reasoning effort parameter from high to medium? And they "traced this back" because they "take reports about degradation very seriously"? Extremely hard to give them the benefit of doubt here.

They had droves of Claude devs vehemently defending and gaslighting users when this started happening