Comment by ivzak
9 hours ago
I think we should draw distinction between two compression "stages"
1. Tool output compression: vanilla claude code doesn't do it at all and just dumps the entire tool outputs, bloating the context. We add <0.5s in compression latency, but then you gain some time on the target model prefill, as shorter context speeds it up.
2. /compact once the context window is full - the one which is painfully slow for claude code. We do it instantly - the trick is to run /compact when the context window is 80% full and then fetch this precompaction (our context gateway handles that)
Please try it out and let us know your feedback, thanks a lot!
No comments yet
Contribute on Hacker News ↗