Comment by Jgrubb

12 hours ago

The tokens are still being burnt, they're just doing so in a parallel dimension from the users main context window.

2 comments

Jgrubb

It's true that the initial tool response still has the same amount of tokens but it doesn't keep dragged along in the longer-lived top context.

The real benefit is being able to use a cheaper, but good enough, model with a specific system prompt dedicated to that task.