Comment by tontinton

21 hours ago

Is it similar to rtk? Where the output of tool calls is compressed? Or does it actively compress your history once in a while?

If it's the latter, then users will pay for the entire history of tokens since the change uncached: https://platform.claude.com/docs/en/build-with-claude/prompt...

How is this better?

We do both:

We compress tool outputs at each step, so the cache isn't broken during the run. Once we hit the 85% context-window limit, we preemptively trigger a summarization step and load that when the context-window fills up.

  • > we preemptively trigger a summarization step and load that when the context-window fills up.

    How does this differ from auto compact? Also, how do you prove that yours is better than using auto compact?