Comment by jgilias

15 hours ago

The cost is far from linear though. Because of prompt caching and the fact that generally output tokens are a lot more expensive than input tokens.

1 comment

jgilias

mg 14 hours ago

Agreed that it is not linear.

I wrote my own agent, and it sends data to LLMs in this order: "General Prompts (How to write good code)" + "The Code" + "The Feature Request". This means the KV cache will be used even when the feature request changes.

And output tokens are usually way less than the input tokens.

So I think that my approach is very lightweight on token usage compared to an interactive session.

It would be interesting to measure it for the other agents out there. Sending a feature request two times vs an interactive session.