Beads – A memory upgrade for your coding agent

12 hours ago (github.com)

> Agents report that they enjoy working with Beads, and they will use it spontaneously for both recording new work and reasoning about your project in novel ways.

I’m surprised by this wording. I didn’t encounter anyone talking about AI preference yet.

Can a trained LLM develop a preference for a given tool within some context and reliably report on that?

Is “what AI reports enjoying“ aligned with AI’s optimal performance?

  • Yegge makes stuff up and is known to say controversial things for fun, so I assume it’s trolling. Product pages often have endorsements and adding funny endorsements is an old joke.

    But I also can’t rule out that he somehow believes it, which I suppose makes it a good troll.

  • The author has a vested interest in AI, that is why it's capabilities may be greatly exaggerated/anthropomorphised as typical for LLM start-ups. Proceed with caution.

I went through the whole readme first and kept wondering what problem the system aims to address. I understood that it is a distributed issue tracker. But how can that lead to a memory upgrade? It also hints at replacing markdown for plans.

So is the issue the format or lack of structure which a local database can bring in?

  • LLMs famously don't have a memory - every time you start a new conversation with the you are effectively resetting them to a blank slate.

    Giving them somewhere to jot down notes is a surprisingly effective way of working around this limitation.

    The simplest version of this is to let them read and write files. I often tell my coding agents "append things you figure out to notes.md as you are working" - then in future sessions I can tell them to read or search that file.

    Beads is a much more structured way of achieving the same thing. I expect it works well partly because LLM training data makes them familiar with the issue/bug tracker style of working already.

    • I’ve been using beads for a few projects and I find it superior to spec kit or any other form of structured workflow.

      I also find it faster to use. I tell the agent the problem, ask them to write a set of tasks using beads, it creates the tasks and it creates the “depends on” tree structure. Then I tell it to work on one task at a time and require my review before continuing.

      The added benefit is the agent doesn’t need to hold so much context in order to work on the tasks. I can start a new session and tell it to continue the tasks.

      Most of this can work without beads but it’s so easy to use it’s the only spec tool I’ve found that has stuck.

      2 replies →

    • Using Claude code recently I was quite impressed by the TODO tool. It seemed like such a banal solution to the problem of keeping agents on track. But it works so well and allows even much smaller models to do well on long horizon tasks.

      Even more impressive lately is how good the latest models are without anything keeping them on track!

    • Thanks! It is the structure that matters here, then. Just like you, I ask my agents to keep updating a markdown file locally and use it as a reference during working sessions. This mechanism has worked well for me.

      I even occasionally ask agents to move some learnings back to my Claude.md or Agents.md file.

      I'm curious whether complicating this behaviour with a database integration would further abstract the work in progress. Are we heading down a slippery slope?

    • I often have them append to notes, too, but also often ask them to deduplicate those notes, without which they can become quite redundant. Maybe redundancy doesn't matter to the AI because I've got tokens to burn, but it feels like the right thing to do. Particularly because sometimes I read the notes myself.

The Beads project uses Beads itself as an issue tracker, which means their issues data is available here as JSONL:

https://github.com/steveyegge/beads/blob/main/.beads/issues....

Here's that file opened in Datasette Lite which makes it easier to read and adds filters for things like issue type and status:

https://lite.datasette.io/?json=https://github.com/steveyegg...

  • Does link 2 … build the whole thing from scratch, in the browser? Wth Simon, are you coding whilst sleeping these days?

It does theoretically look like a useful project. At the same time I'm starting to feel like we're slipping into the Matrix. I check a GitHub issue questioning the architecture.md doc:

> I appreciate that this is a very new project, but what’s missing is an architectural overview of the data model.

Response:

You're right to call me out on this. :)

Then I check the latest commit on architecture.md, which looks like a total rewrite in response to a beads.jsonl issue logged for this.

> JSONL for git: One entity per line means git diffs are readable and merges usually succeed automatically.

Hmm, ok. So readme says:

> .beads/beads.jsonl - Issue data in JSONL format (source of truth, synced via git)

But the beads.jsonl for that commit to fix architecture.md still has the issue to fix architecture.md in the beads.jsonl? So I wonder does that get line get removed now that it's fixed ... so I check master, but now beads.jsonl is gone?

But the readme still references beads.jsonl as source of truth? But there is no beads.jsonl in the dogfooded repo, and there's like ~hundreds of commits in the past few days, so I'm not clear how I'm supposed to understand what's going on with the repo. beads.jsonl is the spoon, but there is no spoon.

I'll check back later, or have my beads-superpowered agent check back for me. Agents report that they enjoy this.

https://github.com/steveyegge/beads/issues/376#issuecomment-...

https://github.com/steveyegge/beads/commit/c3e4172be7b97effa...

https://github.com/steveyegge/beads/tree/main/.beads

  • lmao, agent powered development at its finest.

    Reminds me of the guy who recently spammed PRs to the OCaml compiler but this time the script is flipped and all the confusion is self inflicted.

    I wonder how long will it take us to see a vibe-coded, slop covered OS or database or whatever (I guess the “braveness” of these slop creators will (is?) be directly proportional to the quality of the SOTA coding LLMs).

    Do we have a term for this yet? I mean the person, not the product (slop)

Could you do the same thing with your real issue tracking software? Your agent could use an MCP to create a Jira ticket and create subtasks or tasks for your subagents? Then you don't need to clutter up your repo with these MD files and .beads directories and what not.

Whether this exact approach catches on or not, it's turning the corner from "teaching AIs to develop using tools that were designed for humans" to "inventing new tools and techniques that are designed specifically for AI use". This makes sense because AIs are not human; they have different strengths and limitations.

  • Absolutely. The limitations of AI (namely statelessness) require us to rethink our interfaces. It seems like there's going to be a new discipline of "UX for agents" or maybe even just Agent Experience or AX.

    Software that has great AX will become significantly more useful in the same way that good UX has been critical.

I don't understand the point of this project. We already have github/gitlab for tasks, and if you want to query the history of a chat just stuff the spans in otel.

There are a ton of interesting ideas in the README - things like the way it uses the birthday paradox to decide when to increase the length of the hash IDs.

This tool works by storing JSONL in a .beads/ folder. I wonder if it could work using a separate initially-empty "beads" branch for this data instead? That way the beads data (with its noisy commit history) could travel with the repository without adding a ton of noise to the main branch history.

The downside of that is that you wouldn't be able to branch the .beads/ data or keep it synchronized with main on a per-commit basis. I haven't figured out if that would break the system.

  • The way I read it is beads steers agents to make use of the .beads/ folder to stay in sync across machines. So, my understanding is a dedicated branch for beads data will break the system.

Ha, I was working on the same problem and updating my article when this hit. My focus is on making the agent integration more seamless with the tool. Claude offers a fantastic way for this using "skills" and now a "marketplace"

[1] Demo with Claude - https://pradeeproark.github.io/pensieve/demos/

[2] Article about it - https://pradeeproark.com/posts/agentic-scratch-memory-using-...

[3] https://github.com/cittamaya/cittamaya - Claude Code Skills Marketplace for Pensieve

[4] https://claude.com/blog/skills

Is this that Steve Yegge? A former Googler/Amazon guy with long interesting rants? I don't even remember what about anymore, but I liked to read him back in the day.

I've been trialing jj as my vcs on my latest project, but I guess this only supports git? Anyone using this with jujutsu?

  • It works fine with jj. I have a line in my Claude.md to tell it to make sure to close before committing, and I don’t use the hooks that are provided.

I use gh cli to make and track issues on the repo's issue tracker, create and reference the issue in the PR. I use Claude normally, so have Gemini and Codex that sit as automated reviewers (github apps), then get Claude to review the comments. Rinse and repeat. Works quite well and catches some major issues. Reading the PR's yourself (at least skimming them for sanity) is still vital.

Makes me wonder whether you can just give agents [Taskwarrior](https://taskwarrior.org/).

Set the TASKDATA to `./.task/`. Then tell the agents to use the task CLI.

The benefit is most LLMs already understand Taskwarrior. They've never heard of Beads.

  • That's mentioned in the beads doc, it could have decent but beads is optimizing for agent use, semantic issue relationships, conflict resolution, etc. I've had success with just using gh issues and agents are pretty good at looking for new issues and closing them when done. I have a couple of toy projects where maintaining the code is basically filing a bug report or feature request.

    Also when you say 'never heard of beads' --- it spits ou onboarding text to tell the agent exactly what it needs to know.

    Requires a deep dive, but this is an interesting direction for agent tooling

Neat! I am working on something similar and arriving at similar conclusions. eg sqlite local index. I am not ready to give up human authoring, though. How do tackle the quality gate problem and conformance? For programmatic checks like linting it’s reasonably clear but what about checks that require intelligence?

If there’s any type of memory upgrade for a coding agent I would want, it’s the ability to integrate a RAG into the context.

The information being available is not the problem; the agent not realizing that it doesn’t have all the info is, though. If you put it behind an MCP server, it becomes a matter of ensuring the agent will invoke the MCP at the right moment, which is a whole challenge in itself.

Are there any coding agents out there that enable you to plug middleware in there? I’ve been thinking about MITM’ing Claude Code for this, but wouldn’t mind exploring alternative options.

  • What do you mean by a RAG here?

    I've been having a ton of success just from letting them use their default grep-style search tools.

    I have a folder called ~/dev/ with several hundred git projects checked out, and I'll tell Claude Code things like "search in ~/dev/ for relevant examples and documentation".

    (I'd actually classify what I'm doing there as RAG already.)

    • I do the same thing for libraries I’m using in project. It’s a huge power up for code agents.

      Like you mentioned, agents are insanely good at grep. So much so that I’ve been trying to figure out how to create an llmgrep tool because it’s so good at it. Like, I want to learn how to be that good at grep, hah.

    • What I mean is basically looking at the last (few) messages in the context, translating that to a RAG query, query your embeddings database + BM25 lookup if desired, and if you find something relevant inject that right before the last message in the context.

      It’s pretty common in a lot of agents, but I don’t see a way to do that with Claude Code.

      2 replies →

I've been trying `beads` out for some projects, in tandem with https://github.com/github/spec-kit with pretty good results.

I set up spec-kit first, then updated its templates to tell it to use beads to track features and all that instead of writing markdown files. If nothing else, this is a quality-of-life improvement for me, because recent LLMs seem to have an intense penchant to try to write one or more markdown files per large task. Ending up with loads of markdown poop feels like the new `.DS_Store`, but harder to `.gitignore` because they'll name files whatever floats their boat.

  • I usually just use a commit agent that has as one of its instructions to review various aspects of the prospective commit, including telling it to consolidate any documentation and remove documentation of completed work except where it should be rolled into lasting documentation of architecture or features. I've not rolled it out in all my projects yet, but for the ones I do, it's gotten rid of the excess files.

  • I've found it pretty useful as well. It doesn't compete with gh issues as much as it competes with markdown specs.

    It's helpful for getting Claude code to work with tasks that will span multiple context windows.

Cool stuff. The readme is pretty lengthy so it was a little hard to identify what is the core problem this tool is aiming to solve and how is it tackling it differently than the present solutions.

  • A classic issue of AI generated READMEs. Never to the point, always repetitive and verbose

    • Funnily, AI already knows what stereotypical AI sounds like, so when I tell Claude to write a README but "make it not sounds like AI, no buzzwords, to the point, no repetition, but also don't overdo it, keep it natural" it does a very decent job.

      Actually drastically improves any kind of writing by AI, even if just for my own consumption.

    • I'm not saying it is or isn't written by an LLM, but, Yegge writes a lot and usually well. It somehow seems unlikely he'd outsource the front page to AI, even if he's a regular user of AI for coding and code docs.

    • And full of marketing hyperbole. When I have an AI produce a README I always have to ask it to tone it down and keep it factual.