← Back to context

Comment by Jgrubb

12 hours ago

The tokens are still being burnt, they're just doing so in a parallel dimension from the users main context window.

It's true that the initial tool response still has the same amount of tokens but it doesn't keep dragged along in the longer-lived top context.

The real benefit is being able to use a cheaper, but good enough, model with a specific system prompt dedicated to that task.