Comment by d4rkp4ttern

2 months ago

I dug into this more. It's disabled by default, and it's a cost/token-usage optimization.

  The logic is:

  1. Anthropic's API has a server-side prompt cache with a 1-hour TTL
  2. When you're actively using a session, each API call reuses the cached prefix — you only pay
  for new tokens
  3. After 1 hour idle, that cache is guaranteed expired
  4. Your next message will re-send and re-process the entire conversation from scratch — every
  token, full price
  5. So if you have 150K tokens of old Grep/Read/Bash outputs sitting in the conversation, you're
  paying to re-ingest all of that even though it's stale context the model probably doesn't need

  The microcompact says: "since we're paying full price anyway, let's shrink the bill by clearing
  the bulky stuff."

  What's preserved vs lost:
  - The tool_use blocks (what tool was called, with what arguments) — kept
  - The tool_result content (the actual output) — replaced with [Old tool result content cleared]
  - The most recent 5 tool results — kept

  So Claude can still see "I ran Grep for foo in src/" but not the 500-line grep output from 2
  hours ago.

  Does it affect quality? Yes, somewhat — but the tradeoff is that without it, you're paying
  potentially tens of thousands of tokens to re-ingest stale tool outputs that the model already
  acted on. And remember, if the conversation is long enough, full compaction would have summarized
   those messages anyway.

  And critically: this is disabled by default (enabled: false in timeBasedMCConfig.ts:31). It's
  behind a GrowthBook feature flag that Anthropic controls server-side. So unless they've flipped
  it on for your account, it's not happening to you.