← Back to context

Comment by legohead

7 years ago

So I just ran across this one: git diff --staged

I added a file and wanted to diff it, and this command helped! However, I made some changes in the file, and when I tried this command a second time, the changes don't show up :\ Only the original file that was added shows up in the diff. Now what? It's not the end of the world of course, as I can just look at the file in my editor, but I usually use diffs as a personal code-review before I commit.

Yes, this is another one that confuses developers. When you 'git add' a file, it's added to the staging area as it was at that time. If you make subsequent modifications after you've staged it, those are unstaged changes. And you can diff between them. So despite git add'ing a new file, you still have to 'git add' afterwards if you make changes to it.

There's three "versions" of the code at play, if you will:

* the most recent commit

* the staged files

* the working directory

`git commit`, as you might already know, takes "the staged files" and turns it into a commit, making that the latest commit. `git add` adds a snapshot of a file in your working directory to the staged files. The important bit here is that the copy of the file in the staging area is separate from the file in your working directory. So, if after `git add`ing a file you make more changes, you will need to `git add` those subsequent changes if you wish to commit. `git status` will tell you this:

  » git status
  On branch master
  Changes to be committed:
    (use "git reset HEAD <file>..." to unstage)
  
  	modified:   foo.txt
  
  Changes not staged for commit:
    (use "git add <file>..." to update what will be committed)
    (use "git checkout -- <file>..." to discard changes in working directory)
  
  	modified:   foo.txt

"Changes to be committed" is the staged files. "Changes not staged" is stuff that has been modified, but not `git add`'d. You can see here that I've changed foo.txt after git adding it; if I want those changes, I need to git add it again.

I can look at the diffs, too:

  # diff between the last commit, and the staged files
  # (i.e., what will be committed)
  git diff --staged
  # diff between the staged files and the working directory
  # (unstaged changes)
  git diff
  # diff of all changes since the last commit:
  # (stage+working dir, essentially)
  git diff HEAD

That should be all the various combinations.

I find that a lot of newcomers find the staging area weird, and usually ask some variant of "why would I not want to commit all of the files I've changed?" The staging area, used effectively, can really help you break out things into logical chunks of changes that can then be adequately described with a message in a commit. This can help others later: if your change is a bug fix, and someone wants to cherry-pick it to production, they might not want your new feature, or your lint fixes: they want a minimally risky fix. To that end, the stage/working dir separation acts as a sieve of sorts, letting the stuff that's "ready to go" get filtered out into a commit.

I want to mention the extremely useful `git add -p`: this command will interactively prompt you, hunk by hunk, for whether or not you want to stage that hunk. It will even let you edit the hunks prior to staging. So, for example, if I run across a spelling error, or a minor bug, I can very quickly add it (and just it) to the stage w/ `git add -p`, and then commit it, even if there are other modifications, even in the same file.

  • > There's three "versions" of the code at play, if you will:

    > * the most recent commit

    > * the staged files

    > * the working directory

    This is weird. The staging area is like a commit but not a commit. They're changes that git is aware of and has a record of but not quite a permanent record.

    Why not just make it a commit? You can always keep editing commits, or throw them out, or whatever. That's what I do with Mercurial. I write a commit and I keep adding stuff to it if I think it needs more stuff (or less stuff).

    Gregory Szorc has a more extensive analysis of the situation in first subsection here: https://gregoryszorc.com/blog/2017/12/11/high-level-problems...

    • My best guess is that the commit metadata (particularly, message) is missing. You could always have it be "(uncommited, staged changes)" though, and that's probably descriptive enough. (I agree with you on the whole: having the staged data be a commit makes things conceptually much simpler.)

      My other guess is that the "index" (the other name for the staging area) is also used for conflicts during merges & rebases, and that somehow plays into the problem of making it an actual commit. (But again, this comes across more as an excuse than a reason: I don't any viable reason why the staged changes can't still be an actual commit, and the merge conflict data just stored as a separate structure.)

      That, or the person who added it just didn't think of it, or couldn't do it due to backwards compatibility.