Comment by williamstein

9 days ago

> Checkpoints run as a Git-aware CLI. On every commit generated by an agent, it writes a structured checkpoint object and associates it with the commit SHA. The code stays exactly the same, we just add context as first-class metadata. When you push your commit, Checkpoints also pushes this metadata to a separate branch (entire/checkpoints/v1), giving you a complete, append-only audit log inside your repository. As a result, every change can now be traced back not only to a diff, but to the reasoning that produced it.

The context for every single turn could in theory be nearly 1MB. Since this context is being stored in the repo and constantly changing, after a thousand turns, won't it make just doing a "git checkout" start to be really heavy?

For example, codex-cli stores every single context for a given session in a jsonl file (in .codex). I've easily got that file to hit 4 GB in size, just working for a few days; amusingly, codex-cli would then take many GB of RAM at startup. I ended up writing a script that trims the jsonl history automatically periodically. The latest codex-cli has an optional sqlite store for context state.

My guess is that by "context", Checkpoints doesn't actually mean the contents of the context window, but just distilled reasoning traces, which are more manageable... but still can be pretty large.

6 comments

williamstein

otterley 9 days ago

To add to your comment, I think the next logical question is, “then what?” Surely one can’t build a sustainable business storing these records alone.

snemvalts 9 days ago
MCP server with RAG to feed it back to agents when they are working on a piece of code, and bob's your uncle
- otterley 9 days ago
  
  That sounds like word salad, not a clearly articulated solution to a well-understood customer problem.
  
  2 replies →

jiveturkey 9 days ago

> won't it make just doing a "git checkout" start to be really heavy?

not really? doesn't git checkout only retrieve the current branch? the checkpoint data is in another branch.

we can presume that the tooling for this doesn't expect you to manage the checkpoint branch directly. each checkpoint object is associated with a commit sha (in your working branch, master or whatever). the tooling presumably would just make sure you have the checkpoints for the nearby (in history) commit sha's, and system prompt for the agent will help it do its thing.

i mean all that is trivial. not worth a $60MM investment.

i suspect what is really going on is that the context makes it back to the origin server. this allows _cloud_ agents, independent of your local claude session, to pick up the context. or for developer-to-developer handoff with full context. or to pick up context from a feature branch (as you switch across branches rapidly) later, easily. yes? you'll have to excuse me, i'm not well informed on how LLM coding agents actually work in that way (where the context is kept, how easy it is to pick it back up again). this is just a bit of opining based on why this is worth 20% of $300MM.

if i look at https://chunkhound.github.io it makes me think entire is a version of that. they'll add an MCP server and you won't have to think about it.

finally, because there is a commit sha association for each checkpoint, i would be worried that history rewrites or force pushes MUST use the tooling otherwise you'd end up screwing up the historical context badly.