Comment by d4rkp4ttern
2 months ago
I dug into this more. It's disabled by default, and it's a cost/token-usage optimization.
The logic is:
1. Anthropic's API has a server-side prompt cache with a 1-hour TTL
2. When you're actively using a session, each API call reuses the cached prefix — you only pay
for new tokens
3. After 1 hour idle, that cache is guaranteed expired
4. Your next message will re-send and re-process the entire conversation from scratch — every
token, full price
5. So if you have 150K tokens of old Grep/Read/Bash outputs sitting in the conversation, you're
paying to re-ingest all of that even though it's stale context the model probably doesn't need
The microcompact says: "since we're paying full price anyway, let's shrink the bill by clearing
the bulky stuff."
What's preserved vs lost:
- The tool_use blocks (what tool was called, with what arguments) — kept
- The tool_result content (the actual output) — replaced with [Old tool result content cleared]
- The most recent 5 tool results — kept
So Claude can still see "I ran Grep for foo in src/" but not the 500-line grep output from 2
hours ago.
Does it affect quality? Yes, somewhat — but the tradeoff is that without it, you're paying
potentially tens of thousands of tokens to re-ingest stale tool outputs that the model already
acted on. And remember, if the conversation is long enough, full compaction would have summarized
those messages anyway.
And critically: this is disabled by default (enabled: false in timeBasedMCConfig.ts:31). It's
behind a GrowthBook feature flag that Anthropic controls server-side. So unless they've flipped
it on for your account, it's not happening to you.
No comments yet
Contribute on Hacker News ↗