Comment by bredren
1 day ago
I've been exploring the internals of Claude Code and Codex via the transcripts they generate locally (these serve as the only record of your interactions with the products)[1].
Given the stance of the article, just the transcript formats reveals what might be a surprisingly complex system once you dig in.
For Claude Code, beyond the basic user/assistant loop, there's uuid/parentUuid threading for conversation chains, queue-operation records for handling messages sent during tool execution, file-history-snapshots at every file modification, and subagent sidechains (agent-*.jsonl files) when the Task tool spawns parallel workers.
So "200 lines" captures the concept but not the production reality of what is involved. It is particularly notable that Codex has yet to ship queuing, as that product is getting plenty of attention and still highly capable.
I have been building Contextify (https://contextify.sh), a macOS app that monitors Claude Code and Codex CLI transcripts in real-time and provides a CLI and skill called Total Recall to query your entire conversational history across both providers.
I'm about to release a Linux version and would love any feedback.
[1] With the exception of Claude Code Web, which does expose "sessions" or shared transcripts between local and hosted execution environments.
IMO these articles are akin to "Twitter in 200 lines of code!" and "Why does Uber need 1000 engineers?" type articles.
They're cool demos/POCs of real-world things, (and indeed are informative to people who haven't built AI tools). The very first version of Claude Code probably even looked a lot like this 200 line loop, but things have evolved significantly from there
> IMO these articles are akin to "Twitter in 200 lines of code!"
I don't think it serves the same purpose. Many people understand the difference between a 200 lines twitter prototype and the real deal.
But many of those may not understand what the LLM client tool does and how it relates to the LLM server. It is generally consumed as one magic black box.
This post isn't to tell us how everyone can build a production grade claude-code; it tells us what part is done by the CLI and what part is done by the LLM's which I think is a rather important ingredient in understanding the tools we are using, and how to use them.
Nice, I have something similar [1], a super-fast Rust/Tantivy-based full-text search across Claude Code + Codex-CLI session JSONL logs, with a TUI (for humans) and a CLI/JSONL mode for agents.
For example there’s a session-search skill and corresponding agent that can do:
So you can ask Claude Code to use the searcher agent to recover arbitrary context of prior work from any of your sessions, and build on that work in a new session. This has enabled me to completely avoid compaction.
[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...
That is a cool tool. Also one can set "cleanupPeriodDays": in ~/.claude/settings.json to extend cleanup. There is so much information these tools keep around we could use.
I came across this one the other day: https://github.com/kulesh/catsyphon
This is very interesting, especially if you could then use an llm across that search to figure out what has and maybe has not been completed, and then reinject those findings into a new Claude code session
I haven't written the entry yet but it is pretty incredible what you can get when letting a frontier model RAG your complete CLI convo history.
You can find out not just what you did and did not do but why. It is possible to identify unexpectedly incomplete work streams, build a histogram of the times of day you get most irritated with the AI, etc.
I think it is very cool and I have a major release coming. I'd be very appreciative of any feedback.
For that you'd be better off having the LLM write TODO stubs in the codebase and search for that. In fact, most of the recent models just do this, even without prompting.
> So "200 lines" captures the concept but not the production reality of what is involved.
How many lines would you estimate it takes to capture that production reality of something like CC? I ask because I got downvoted for asking that question on a different story[1].
I asked because in that thread someone quoted the CC dev(s) as saying:
>> In the last thirty days, I landed 259 PRs -- 497 commits, 40k lines added, 38k lines removed.
My feeling is that a tool like this, while it won't be 200 lines, can't really be 40k lines either.
[1] If anyone is interested, https://news.ycombinator.com/item?id=46533132
My guess is <5k for a coherent and intentional expert human design. Certainly <10k.
It’s telling that they can’t fix the screen flickering issue, claiming “the problem goes deep.”