Comment by WorldMaker

3 days ago

I don't think invalid ASTs are a "lesser" problem, it is a pretty big one: we want to be able to source control work in progress and partially complete things. There's a lot of reasons you might not want to or be able to finish a bit of code and yet you still want to record what you've done and where you are (to pick it back up later, to get other developers' eyes on a sketch or an outline, to save it to backup systems, etc). Those are often important steps in development, though it is easy to forget about how common they are when you think about software as finished/buildable artifacts only.

I know a lot of people think source control should only have buildable code, but that's what CI processes are for and people use source control (and diffs) for a lot of things that don't need to pass CI 100% of the time.

I don't understand why it can't just use an AST if it parses, and fall back to plain text diffs if it doesn't.

  • Churn in the diffs is a big reason, if the point of wanting a semantic diff is to have a smarter diff for smarter patches/merges. The smartness of your merge is generally a lowest common denominator operation. If most of your intermediate diffs are dumb plain text diffs, your final merge operation is to some extent mostly going to still be a dumb plain text merge.

    That may be fine if you are happy with the plain text status quo, but if your goal is to avoid or minimize merge conflicts (as most people want when talking about semantic diff), you don't really solve that as well as you'd like.

    (Additionally, and it is a lot less of a concern for git on disk storage but for some git-based email flows and other VCSes patch size matters and a consistent style of diffs between patches can be a useful storage or transfer optimization. Plain text diffs are more likely to produce a lot bigger patches compared to optimization wins you might get from a semantic diff; a mixture of merges between semantic and plain text diffs is often a worst of both worlds case in overall patch sizes as they churn against each other.)