Comment by stuxf

7 days ago

> Some coding agents (Shelley included!) refuse to return a large tool output back to the agent after some threshold. This is a mistake: it's going to read the whole file, and it may as well do it in one call rather than five.

disagree with this: IMO the primary reason that these still need to exist is for when the agent messes up (e.g reads a file that is too large like a bundle file), or when you run a grep command in a large codebase and end up hitting way too many files, overloading context.

Otherwise lots of interesting stuff in this article! Having a precise calculator was very useful for the idea of how many things we should be putting into an agent loop to get a cost optimum (and not just a performance optimum) for our tasks, which is something that's been pretty underserved.

I think that's reasonable, but then they should have the ability for the agent to, on the next call, override it. Even if it requires the agent to have read the file once or something.

In the absence of that you end up with what several of the harnesses ended up doing, where an agent will use a million tool calls to very slowly read a file in like 200 line chunks. I think they _might_ have fixed it now (or agent-fixes, my agent harness might be fixing it), but Codex used to do this and it made it unbelievably slow.

  • You’re describing peek.

    An agent needs to be able to peek before determining “Can I one shot this or does it need paging?”

    • Yep, I previously implemented it under that name in my own harness. That being said, there is value in actually performing a normal read, because you do often complete it on that first glance.

      1 reply →

> when you run a grep command in a large codebase and end up hitting way too many files, overloading context.

On the other hand, I despise that it automatically pipes things through output-limiting things like `grep` with a filter, `head`, `tail`, etc. I would much rather it try to read a full grep and then decide to filter-down from there if the output is too large -- that's exactly what I do when I do the same workflow I told it to do.

Why? Because piping through output liming things can hide the scope of the "problem" I'm looking at. I'd rather see the scope of that first so I can decide if I need to change from a tactical view/approach to a strategic view/approach. It would be handy if the agents could do the same thing -- and I suppose they could if I'm a little more explicit about it in my tool/prompt.

  • In my experience this is what Claude 4.5 (and 4.6) basically does, depending on why its grepping it in the first place. It'll sample the header, do a line count, etc. This is because the agent can't backtrack mid-'try to read full file'. If you put the 50,000 lines into the context, they are now in the context.

    • > If you put the 50,000 lines into the context, they are now in the context.

      And you can't revert back to a previous context, and then add in new context summarizing to something like "the file is too large" with how to filter "there are too many unrelated lines matching '...', so use grep"?

      Using output-limiting stuff first won't tell you if you've limited too much. You should search again after changing something; and if you do search again then you need to remember which page you're on and how many there are. That's a bit more complex in my opinion, and agents don't handle that kind of complexity very well afaik.