Comment by bob1029
10 hours ago
The target codebase is very large. A million tokens is a drop in the proverbial bucket.
I still don't understand how caching helps me very much. I must be misunderstanding it because I thought the user's prompt (which is the biggest variable) necessarily sits prior to all of these token intensive tool calls. How can we cache the reading of codebase if the prefix is always moving?
If an agent makes a tool call, the LLM provider will receive the full context again after the result of the tool call becomes available in order to decide the next move. Everything up to the point of the tool call being made will no longer change and could thus in theory be cached. If the agent makes a ton of tool calls, then for every tool call one should be hitting the cache an equal amount of times.
A new instruction by the user will be appended at the end if it done in the same conversation. Thus only has influence on the cacheability of the original agent prompt, but not of subsequent tool calls.
Often to me it seams like using MA is like letting a million monkeys lose.
Has ai forgotten about high level design? Surely all it needs to know is what the methods, objects or functions in the code base actually does and the actual code it is meant to be fixing?
I wonder if half the issues is that the LLM try to change too much?
[dead]
> The target codebase is very large.
But, does every prompt need the entire codebase?
How could it not? Can you ever guarantee accurate answers about a book you haven't entirely read?