← Back to context

Comment by akersten

8 hours ago

I guess to me, I'm looking at it from the perspective of diffing the repo between the squashed commit on main and the tip of the incoming PR. If there are merge conflicts during the rebase in files that don't appear in that diff, I consider that a hallucination, because those changes must already in the target branch and no matter what happened to those files along the way to get there, it will always be a waste of my time to see them during an interactive rebase.

I don't think we need to store any additional metadata to make the rebase just slightly more smarter and able to skip over the "obvious" commits in this way, but I'm also just a code monkey, so I'm sure there are Reasons.

You’re looking at it from the perspective of a human reasoning. But a computer is a simple machine (what it can do, not how it does it). What seems obvious to you could be a complicated algorithm.

Git store all its information as a directed acyclic graph (a tree) of commits. The leaves of that tree have names, and are what we called branches. Each commit points to a tree (also a tree data structure) where the nodes are blobs (files) and sub trees. But that tree only stores the files that has been changed since the last commit. Git does not store diffs. Diffs are computed as needed.

This why the common ancestor commit is important. From there, a version of the working directory is computed for each branch (main-with-squashed-A and PR B). Files that have not been changed since PR A are ok, but everything else will be different, especially if you’ve modified the same lines.

Squashed A is a brand new commit with a new tree that PR B does not know about. You need to recompute PR B on top of Squashed A, (which will create new commits for PR B).