Comment by vidarh

6 days ago

> When you push your commit, Checkpoints also pushes this metadata to a separate branch (entire/checkpoints/v1)

For Claude Code, this is literally a JSONL file in .claude/projects/[path with - instead of /]/[uuid].jsonl... You can trivially have Claude Code write a commit hook to do this for you if you find it useful.

I'm sure their vision is wider than that, but they will need to iterate fast for this not to be made obsolete before they can even release something.

FWIW, I had an agent adding archiving of the JSONL for changesets linked to the work they're doing right as I started looking at this article, as when you start automating a non-interactive agent flow it's such an obviously necessary step to be able to retrospectively improve the workflows.

6 comments

vidarh

MarcelOlsz 6 days ago

How do you handle the retreival aspect? So you have this set up and now what?

>FWIW, I had an agent adding archiving of the JSONL for changesets linked to the work they're doing

Would love to know more! Sounds interesting.

vidarh 6 days ago
I'm taking a very crude approach for now. My "improvement agent" has a few distinct stages: A goals extraction step that explores the repository, with some pointers to specific files that is consider authoritative as to the users intent, and then builds out files in docs/goals. That then feeds into a plan ideation stage that results in directories for plans in docs/plans. That directory has the plan itself, and logs of revisions etc. So for now I'm just dumping snapshots of those jsonl line files in there.
Putting it in a branch so it doesn't pollute your checked out copy may well be a good idea in the longer run. For now, I keep all the plans available, as I then have a review stage that reviews all the plans, and writes things like "the user got increasingly exasperated as the agent kept ignorning direction" :D and helps propose improvements to the tool and workflow to reduce the number of those exasperated movements...
I'm thinking of packaging it up and open-sourcing it. It's all very experimental and likely to totally change every day for now, but I find it helpful. It's built me a personal dashboard, and keeps adding stuff to it with relatively minimal direction beyond "spying" on my notes and journal at this point. At one point a plan specifically called me out for procrastinating and planned for how to "work around" that with tooling (I wish it'd succeed at that).
There's nothing really fancy here, just feedback loops that ensures the wild claims the agents sometimes will make are tested and rejected.
To the original JSONL bit, the uuid you need to look it up is also the UUID you need to call "claude --resume [uuid]", so extracting it also allows for e.g. having the verification agent (that checks if the implementation agent was truthful when ticking off the quality gates - spoiler: it very often isn't) feed its report back into the original implementation conversation if rejected, instead of having the implementation agent "start over" without the full context. I haven't tested that yet, but I'm hopeful.
Though even if you don't have it restart, you can point it to the snapshot of the previous conversation as a source of additional info, as another option.
- MarcelOlsz 6 days ago
  
  That sounds incredibly interesting and helpful. You should pack it up and open source it. You might even get $60 million. I'd be super interested in seeing it. I love that idea about the UUID. I've been experimenting with that myself. I'd be interested in working on this with you if you wanted to open source it.

benterix 6 days ago

That's why several VC-funded AI companies are deliberately vague about what they are doing.

6031769 6 days ago
That is charitable of you. The alternative viewpoint is that they are vague about it because they have no idea what they are doing.
- vidarh 5 days ago
  
  I'm sure they have a vision of something but I also think the moat is narrow here. A lot of what these companies thinks will be "special sauce" will be unnecessary because the frontier models are advancing 6 months from now, and survival at that point will come down to extremely proficient execution to manage to stay ahead.
  (And most of these companies will not be good enough at that)