← Back to context

Comment by MarcelOlsz

2 days ago

How do you handle the retreival aspect? So you have this set up and now what?

>FWIW, I had an agent adding archiving of the JSONL for changesets linked to the work they're doing

Would love to know more! Sounds interesting.

I'm taking a very crude approach for now. My "improvement agent" has a few distinct stages: A goals extraction step that explores the repository, with some pointers to specific files that is consider authoritative as to the users intent, and then builds out files in docs/goals. That then feeds into a plan ideation stage that results in directories for plans in docs/plans. That directory has the plan itself, and logs of revisions etc. So for now I'm just dumping snapshots of those jsonl line files in there.

Putting it in a branch so it doesn't pollute your checked out copy may well be a good idea in the longer run. For now, I keep all the plans available, as I then have a review stage that reviews all the plans, and writes things like "the user got increasingly exasperated as the agent kept ignorning direction" :D and helps propose improvements to the tool and workflow to reduce the number of those exasperated movements...

I'm thinking of packaging it up and open-sourcing it. It's all very experimental and likely to totally change every day for now, but I find it helpful. It's built me a personal dashboard, and keeps adding stuff to it with relatively minimal direction beyond "spying" on my notes and journal at this point. At one point a plan specifically called me out for procrastinating and planned for how to "work around" that with tooling (I wish it'd succeed at that).

There's nothing really fancy here, just feedback loops that ensures the wild claims the agents sometimes will make are tested and rejected.

To the original JSONL bit, the uuid you need to look it up is also the UUID you need to call "claude --resume [uuid]", so extracting it also allows for e.g. having the verification agent (that checks if the implementation agent was truthful when ticking off the quality gates - spoiler: it very often isn't) feed its report back into the original implementation conversation if rejected, instead of having the implementation agent "start over" without the full context. I haven't tested that yet, but I'm hopeful.

Though even if you don't have it restart, you can point it to the snapshot of the previous conversation as a source of additional info, as another option.

  • That sounds incredibly interesting and helpful. You should pack it up and open source it. You might even get $60 million. I'd be super interested in seeing it. I love that idea about the UUID. I've been experimenting with that myself. I'd be interested in working on this with you if you wanted to open source it.