Comment by danmaz74
3 days ago
When the task is bigger than I trust the agent to work on it on its own, or for me to review the results, I ask it to create a plan with steps. Then create a md file for each step. I review the steps, and ask the agent to implement the first one. Review that one, fix it, then ask it to update the next steps, and then implement the next one. And so on, until finished.
Have you tried Scoped context packages? Basically for each task, I create a .md file that includes relevant file paths, the purpose of the task, key dependencies, a clear plan of action, and a test strategy. It’s like a mini local design doc. I found that it helps ground implementation and stabilizes the output of the agents.
I read this suggestion a lot. “Make clear steps, a clear plan of action.” Which I get. But then instead of having an LLM flail away at it could we give to an actual developer? It seems like we’ve finally realized that clear specs makes dev work much easier for LLMs. But the same is true for a human. The human will ask more clarifying questions and not hallucinate. The llm will role the dice and pick a path. Maybe we as devs would just rather talk with machines.
I'm using it to help me build what I want and learn how. It being incorrect and needing questioning isn't that bad, so long as you ARE questioning it. It has brought up so many concepts, parameters, etc that would be difficult to find and learn alone. Documentation can often be very difficult to parse. Llms make it easier.
Yes, but the difference is that an LLM produces the result instantly, whereas a human might take hours or days.
So if you can get the spec right, and the LLM+agent harness is good enough, you can move much, much faster. It's not always true to the same degree, obviously.
Getting the spec right, and knowing what tasks to use it on -- that's the hard part that people are grappling with, in most contexts.
> Maybe we as devs would just rather talk with machines.
This is kind of how I feel. Chat as an interaction is mentally taxing for me.
Separately, you have to consider that "wasting tokens spinning" might be acceptable if you're able to run hundreds of thousands of these things in parallel. If even a small subset of them translate to value, then you're far net ahead vs with a strictly manual/human process.
> hundreds of thousands of these things in parallel
At what cost,. monetary and environmental?
If the system provides value that is greater than its cost, then paying the cost to gain the value is always worthwhile - regardless of the magnitude of the cost.
As costs drop exponentially (a reasonable expectation for LLMs, etc.) then increasing agent parallelism becomes more and more economically viable over time.
1 reply →
I do the same thing with my engineers but I keep the tasks in Jira and I label them "stories".
But in all seriousness +1 can recommend this method.
This is built into Cursor now with plan mode https://cursor.com/docs/agent/planning
How does Cursor plan mode differ from Claude Code plan mode? I've used the latter a lot (it's been there a long time), and the description seems very similar. The big difference with the workflow I described is that with that plan mode you don't get to review and correct what happened between steps.
I've not used Claude Code, so my answer might not be that useful. But I would think that because both are chat-based interfaces you would be able to instruct the model to either continue without approval or wait for your approval at each step. I certainly do that with Cursor. Cursor has also recently started automatically generating TODO lists in the background (with a tool call I'm assuming), and displaying them as part of the thinking process without explicit instruction. I find that useful.
this plus a reset in between steps usually helps focus context in my experience