Comment by jgilias
15 hours ago
The cost is far from linear though. Because of prompt caching and the fact that generally output tokens are a lot more expensive than input tokens.
15 hours ago
The cost is far from linear though. Because of prompt caching and the fact that generally output tokens are a lot more expensive than input tokens.
Agreed that it is not linear.
I wrote my own agent, and it sends data to LLMs in this order: "General Prompts (How to write good code)" + "The Code" + "The Feature Request". This means the KV cache will be used even when the feature request changes.
And output tokens are usually way less than the input tokens.
So I think that my approach is very lightweight on token usage compared to an interactive session.
It would be interesting to measure it for the other agents out there. Sending a feature request two times vs an interactive session.