← Back to context

Comment by oblio

3 days ago

That's not (or should not be what's happening).

They write a short high level plan (let's say 200 words). The plan asks the agent to write a more detailed implementation plan (written by the LLM, let's say 2000-5000 words).

They read this plan and adjust as needed, even sending it to the agent for re-dos.

Once the implementation plan is done, they ask the agent to write the actual code changes.

Then they review that and ask for fixes, adjustments, etc.

This can be comparable to writing the code yourself but also leaves a detailed trail of what was done and why, which I basically NEVER see in human generated code.

That alone is worth gold, by itself.

And on top of that, if you're using an unknown platform or stack, it's basically a rocket ship. You bootstrap much faster. Of course, stay on top of the architecture, do controlled changes, learn about the platform as you go, etc.

I take this concept and I meta-prompt it even more.

I have a road map (AI generated, of course) for a side project I'm toying around with to experiment with LLM-driven development. I read the road map and I understand and approve it. Then, using some skills I found on skills.sh and slightly modified, my workflow is as such:

1. Brainstorm the next slice

It suggests a few items from the road map that should be worked on, with some high level methodology to implement. It asks me what the scope ought to be and what invariants ought to be considered. I ask it what tradeoffs could be, why, and what it recommends, given the product constraints. I approve a given slice of work.

NB: this is the part I learn the most from. I ask it why X process would be better than Y process given the constraints and it either corrects itself or it explains why. "Why use an outbox pattern? What other patterns could we use and why aren't they the right fit?"

2. Generate slice

After I approve what to work on next, it generates a high level overview of the slice, including files touched, saved in a MD file that is persisted. I read through the slice, ensure that it is indeed working on what I expect it to be working on, and that it's not scope creeping or undermining scope, and I approve it. It then makes a plan based off of this.

3. Generate plan

It writes a rather lengthy plan, with discrete task bullets at the top. Beneath, each step has to-dos for the llm to follow, such as generating tests, running migrations, etc, with commit messages for each step. I glance through this for any potential red flags.

4. Execute

This part is self explanatory. It reads the plan and does its thing.

I've been extremely happy with this workflow. I'll probably write a blog post about it at some point.

  • If you want to have some fun, experiment with this: add a step (maybe between 3 and 4):

    3.5 Prove

    Have the LLM demonstrate, through our current documentation and other sources of facts, that the planned action WILL work correctly, without failure. Ask it to enumerate all risks and point out how the plan mitigates each risk. I've seen on several occasions, the LLM backtrack at this step and actually come up with clever so-far unforeseen error cases.

Yep with a human in the loop to process these larger sprawling plan docs (inflated with the intent of the designer iteratively)

Some get deleted from repo others archived, others merged or referenced elsewhere. It's kind of organic.

[flagged]

  • > Haven't seen a single useful thing produced by this garbage process you describe

    By using it first-hand or by a colleague? And useful to whom, you, or the person writing it? There are plenty of people in this thread who have actually used this "garbage process," myself included, to produce stuff we, and our colleagues, find is useful.

  • What genuinely new thing have you produced?

    • Well I'm actually producing, not having an llm do things for me and frying my brain in the process. If you're building things with the process described above you're not producing anything, Dalio or Altman's GPUs are, you're simply just a slot machine user.

      Have fun paying for "Think for me Saas".

      2025-2026: The year everyone became the mental equivalent of obese and let their brain atrophy. There are no shortcuts in life that don't come at a huge cost, remember how everyone forgot how to navigate without a maps app, that's going to be you with writing code/reading code/thinking about code.

      2 replies →