Comment by bhl
4 years ago
Anyone know how to compute the diff between two ASTs if the edits _are_ known, either in general or with tree-sitter.
The method used by diffsitter is to compute the path with minimal editing distance: https://github.com/afnanenayet/diffsitter/blob/main/src/ast.....
In the case where the edits are known (because you're re-creating the AST per keystroke), you can save parsing time by passing in the old tree and a set of edits: https://tree-sitter.github.io/tree-sitter/using-parsers#edit....
Intuitively, I feel like there should be a way to reduce diffing time as well to avoid O(mn) run and space time that comes from the dynamic programming solution to compute the minimal edit.
I've made a incremental parser, and solved it by only re-parsing the changed code block. And the diff would then be the entire branch, but you could compare the old branch with the new branch and get a more fine detailed diff. It's hard to know beforehand what will get better performance, I found that for small files < 100 LOC it was faster to re-parse the entire file.
You are correct. It is indeed possible to use dynamic programming. For more info check out [0]. OP’s project probably takes too long on nontrivial merges in its current state.
[0] https://thume.ca/2017/06/17/tree-diffing/
That link does go over an optimized tree-diffing algorithm for ASTs, but the question I'm wondering is how to extend incremental parsing into incremental diffing, as opposed to full parse into full diff.
The short answer is that it's unsolved in general and just done ad-hoc in practice. I'm working on incremental & demand-driven analysis techniques for my PhD research and currently building essentially what you describe. The state-of-the-art is probably this technique from this year's PLDI [1] -- last I heard they were using a batch parser but planning on integrating with tree-sitter at some point. [1] https://dl.acm.org/doi/10.1145/3453483.3454052
Yes, it's not currently very performant for larger files (have a branch in the works to fix this). Checked out the link, it's an interesting read