Comment by comex

5 years ago

Indeed, the example seems a little underwhelming.

It does demonstrate that diffsitter can tell that `fn main() {` was not semantically changed at all, while still being able to cope with syntactically invalid code like `fn add_one {`.

But look at the following, where I've attempted to manually convert diffsitter's generated diff into a unified diff format, making it easier to understand what's going on:

    1   fn main() {
    2 -     let x = 1;
    3 - fn add_one {
    4 + }
    5 + fn addition() {
    6   }
    7 + fn add_two() {
    8   }

Look at the closing brace on line 6, which doesn't have a - or + in front of it, meaning that the diff tool has found a closing brace in the old code and one in the new code and decided that they correspond to each other. But this correspondence is not syntactically meaningful. In the old code, that closing brace belongs to `fn main`. In the new code, however, it belongs to `fn addition`, while `fn main`'s closing brace is now represented by the "inserted" closing brace on line 4!

Mixing up braces this way is a typical weakness of traditional diff tools. I'd hope that a syntax-aware diff tool could prevent such mix-ups, and instead generate a diff like:

    1   fn main() {
    2 -     let x = 1;
    3   }
    4 - fn add_one {
    5 - }
    6 + fn addition() {
    7 + }
    8 + fn add_two() {
    9 + }

Even though this diff is one line longer (while diff tools typically aim to produce the shortest possible diff), it's more semantically correct, and would be much easier to deal with if a merge conflict came into play.

Unfortunately, diffsitter does not do this, at least not in that demo. And I don't see how it would be [edit: caused by] a disconnect between how a parser parses the code and how a human parses it. The parser has the same idea of closing braces belonging to particular opening constructs as humans do.

4 comments

comex

feanaro 5 years ago

> And I don't see how it would be because of a disconnect between how a parser parses the code and how a human parses it. The parser has the same idea of closing braces belonging to particular opening constructs as humans do.

I don't understand the argument you made here. Why couldn't it do this?

Compared to the first snippet, in the second snippet:

1. There is still a function called `main`, but it has no lines, compared to a single line before. The conclusion is that this line was removed.

2. There is no longer an `add_one` function.

3. There are two new functions, `addition` and `add_two`.

This is exactly what your wanted diff is showing and all of those things can be determined by a parser.

comex 5 years ago
I agree. I think you misinterpreted my comment. By "I don't see how it would be because of a disconnect", I meant "I don't see how the issue could be caused by a disconnect". (Maybe you read it as "I don't see how diffsitter would do that, because there's a disconnect"?) The suggestion of a disconnect came from the parent comment, and I was attempting to refute it.
- feanaro 5 years ago
  
  Oops, yep. I mentally dropped a few words there from the quoted part, even though I re-read multiple times. Thanks for bearing with me.

marcodiego 5 years ago

Definitely underwhelming. Actually I couldn't see how it is better than "diff -u".