Comment by GodelNumbering

1 day ago

Interesting things Dirac does:

1. Uses an optimized version of Hash-Anchored edits for file editing (https://dirac.run/posts/hash-anchors-myers-diff-single-token)

2. Utilizes language's AST to decide what to fetch into context, entirely avoids large code file reads

3. Batches all operations. Does large number of reads/edits simultaneously (you can see a video demo for deepseek-v4-flash here https://www.reddit.com/r/LocalLLaMA/comments/1suhdki/tested_...)

4. Allows the model to execute code to analyze things on the fly, so the model can simply write bash/python/perl script to accomplish things where appropriate

5. A lot of context curation and opportunistic context updates, i.e. put into context anything that you are certain model would ask next

38 comments

GodelNumbering

deskamess 1 day ago

I always wondered why AST's were not more of a part in both editing and scoping of changes/parsing code. I thought I read an article where they said 'grep' was just as effective. It kinda made sense for the case they were talking about.

miki123211 13 hours ago

I think we should use ASTS more, not for performance, but for easier code review.
Changes that are primarily code refactorings, like breaking up a large module into a bunch of smaller ones, or renaming a commonly-used class, are extremely tedious to review, both in LLM generated diffs and human-written PRs. You still have to do it; LLMs have a habit of mangling comments when moving code across files, while for a human, an unassuming "rename FooAPIClient to LegacyFooAPIClient" PR is the best place to leave a backdoor when taking over a developer's account. Nevertheless, many developers just LGTM changes like this because of the tedium involved in reviewing them.
If one could express such changes as a simple AST-wrangling script in a domain-specific language, which would then be executed in a trusted environment after being reviewed, that would decrease the review burden considerably.
I believe that with agentic development, the most important constraint we have is human time. Making the LLM better and faster won't help us much if the human still needs to spend a majority of their time reading code. We should do what we can to give us less code to read, without losing confidence in the changes that the LLM makes.
GodelNumbering 1 day ago
Grep is effective for the most part, except for situations like when you have huge codebases and the thing you're looking for is used in too many places both as symbol and non-symbol.
Another annoying thing about plain grep is, LLMs often end up pulling in bundled packages when using grep where 1 line is large enough to ruin the context window
- embedding-shape 21 hours ago
  
  > Grep is effective for the most part
  It's very effective in well-written and well-designed code bases where concepts tend to be relatively well formed to not be named the same as everything else, so grepping for symbols give you good search results.
  Projects where the god-object or core concepts are generic names like "Tree", "Node" or other things that are used everywhere, tends to be short of impossible to search with grep and friends.
sigbottle 19 hours ago
It's not intuitive to humans, even after learning parsing theory. I can do basic name refactorings. I've even written neovim plugins to do 1 specific thing with the AST (dfs down and delete one subtree which I understand). Those are fine.
I would not be comfortable doing an on-the-fly "rewrite all subtrees that match this pattern" kind of edit.
It seems like a tool that's good for LLM's though.
- spullara 17 hours ago
  
  "rewrite all subtrees that match this pattern" works really well in jetbrains, they call it structure search-and-replace.
lukeundtrug 15 hours ago
Happened to have written both a tool and a blog post about the topic. It’s more about the different technical approaches you have in solving the problem but it might still interest you :)
https://www.context-master.dev/blog/deterministic-semantic-c...
Let me know, what you think
- vintagedave 4 hours ago
  
  This is interesting - I have been working on the same thing, building contextual data, LSP-style.
  I saw the tools page where if I understand right, `get-symbol-context` is actually the main useful tool for what you provide? The others seem more metadata it's easy to get already (?) but that tool provides the extra info.
  I had been working on exposing mine as more high-level, ie multiple APIs to query different kinds of metadata about symbols, types, etc. But I am still not sure of the best approach, where my thinking was about not overloading the AI with too many different tools. They accumulate quickly.
  
  1 reply →
jwr 15 hours ago

I just realized that the fact that LLMs work so well for me in Clojure might be partly because of the clojure-mcp tools. They provide structural browsing and editing.
tmzt 15 hours ago
Has anybody thought about encoding AST tokens as LLM tokens, similar to how different words can have different meanings and that's reflected in their embedding?
- janalsncm 14 hours ago
  
  Language keywords are almost definitely individual tokens. But I think you mean more than that. Basically replacing identifiers with special tokens as well. It’s worth a shot but there’s some practical problems.
  Immediate downside is that mapping variable name to token and back would probably require indexing the whole codebase. You’d need a 1:1 mapping for every name that was in scope, and probably need to be clever about disambiguating names that come in and out of scope.

messh 19 hours ago

Anchor based editing requires injecting new anchors to the context, and dirac does so via a diff. So how is this more efficient (token-wise) than search and replace?? Even at a single token per hash. Also, code is read more than written so these just add up. I experimented once with stable anchors, albeit longer than a single token, and found it a downgrade.

My conclusion is that the efficiency dirac sees comes mainly from showing file skeleton by default

hedgehog 17 hours ago
I'm not sure one way or another but I've been using a related tool called Tilth by another poster here. It doesn't do anchor-based editing, but it does do syntax-aware search and will e.g. report the line range for function definitions, provide file outlines with line numbers on a file name match, etc.
https://github.com/jahala/tilth
- messh 16 hours ago
  
  ohh this is really nice :) testing it
  
  1 reply →
gchamonlive 16 hours ago
> My conclusion is that the efficiency dirac sees comes mainly from showing file skeleton by default
how hard do you think it would be to bring this optimization to oh-my-pi and opencode? I am testing dirac and it's very cool but the tooling isn't there yet comparing to oh-my-pi in terms of UX.
- GodelNumbering 12 hours ago
  
  Would love some more feedback on this. Where do you think are major gaps?
  
  1 reply →

jbellis 19 hours ago

> Batches all operations. Does large number of reads/edits simultaneously...

I wasn't sure what this meant, so I looked at the source. It seems to be referring to tool APIs being designed around taking multiple targets as a list parameter, instead of hoping the model makes appropriately parallel tool calls. (This matches my experience btw, models are reluctant to make a large number of parallel calls simultaneously, and this seems more pronounced with weaker models.)

verdverm 19 hours ago

I think Anthropic may have mentioned this first, this pattern is also something my custom agent's tools are designed around, pretty sure I picked it up from them.

jimmcslim 10 hours ago

For the hash-anchored edits, sharing here Can Bölük's original post about the idea https://blog.can.ac/2026/02/12/the-harness-problem/

faangguyindia 7 hours ago

Instead of burning tokens on SOTA models, why not use a dirt-cheap specialised model for file editing?

Where the SOTA model just makes a cheaper model to make edits, and it does so.

gbalduzzi 4 hours ago

Yeah I also believe that there are plenty of efficiency gains available by using different models for different tasks. Reasoning models such as opus should only be used for the main planning and decision flows, but sub operations (exploring, applying edits etc etc) could be delegated to smaller and cheaper models. You also end up with a much smaller context for the main big model

UncleOxidant 20 hours ago

> Utilizes language's AST to decide what to fetch into context,

Does that mean that it's only going to work with certain langauges for which it has parsers available?

GodelNumbering 20 hours ago

It uses tree-sitter wasms. Currently, 14 languages are available (https://github.com/dirac-run/dirac/tree/master/src/services/...)
The agent would work even without a language parser, just that the AST-based functionalities won't work
gavinray 20 hours ago

Yes

sally_glance 20 hours ago

Is there a complete list of the tools somewhere? I'm interested in how you chose to expose the AST specifically. In my own harness attempts I wanted to keep the number of tools absolutely minimal and briefly experimented with including an AST lib to use via an execute_python tool (plus some examples in the system prompt). Results were mixed though, with most models preferring ripgrep.

rgbrgb 19 hours ago

It would be really cool to do a causality investigation to determine which one of these boosts it so much / quantify how much each matters. Who knows, they may all interact in a sum-is-greater-than-parts way that only improves the score when shipped altogether.

blurbleblurble 20 hours ago

Did you consider incorporating ast-grep or gritql?

Congratulations, great work.

sally_glance 20 hours ago
Can't speak for OP but I tried providing ast-grep in the execution context of an execute_bash tool, but even with pretty aggressive steering most models just don't seem to use it a lot. More expensive/SOTA models or higher reasoning increases the chances but lowers speed and raises cost. Maybe due to training bias for exploration tasks?
- blurbleblurble 20 hours ago
  
  Yes, I've tried this passive approach too and didn't dig much further after that. I thought maybe they'd figured out something more intentional in the prompting to enable these kinds of approaches.
  
  2 replies →
GodelNumbering 18 hours ago

Not really, but interested in trying them out for a future version, especially gritql.

drakythe 18 hours ago

How are the two token anchors chosen when the initial 1700 single token anchors run out? I'm assuming just a 2 word combination from the 1700.

GodelNumbering 12 hours ago

That's correct

tripleee 21 hours ago

[flagged]