← Back to context

Comment by nikanj

7 years ago

We pushed large binaries into our git in the past. This was fine-ish as long as Git was hosted inhouse, but now that it's SAASed out, they are a huge pain in the rear.

I've browsed through a few git guides, but can't seem to find anything that would let me:

1) Do something like "du -s *|sort -n" for the entire Git history

2) Let me "rm -rf --from-history-too", that would cause the remote repo to actually shrink in size.

I think you should be able to do something like this with

  git filter-branch

This won't be a terribly fun exercise, and could be very painful if your history contains a lot of merges. (should be easier with the cactus/rebase development model)

And of course everyone will have to hard-reset to the new branch.

I should mention I'm far from an expert on this. I've only ever used git filter branch on a handful of commits, and only based on examples provided by kind internet people. I certainly haven't done anything nearly as far-reaching as you're about to embark on.

  • Yeah, filter-branch is the way to go for this. Also to for example extract a folder into a new repository. I've used it in a few cases.

Early in the history of a repository, I committed some files with sensitive information. The only way to fix this (and similar problems) is to reconstruct the repos starting from the commit just before you committed the unwanted file(s).

I'm a bit of a git naif, there are doubtless better ways to do this. This was mine:

  0. Back up my repo.
  1. Save the entire commit history as patch files.
  2. Use BFG (amazing tool) to scrub all references to the unwanted files [0]
  3. Create new repo from the commit just before unwanted files.
  4. Apply the patches.
  5. Use a custom Perl script to apply the dates in the patch files to the new history. [1]

Technically, your repo will be fully reconstructed at step 4. Also, be advised the patch files themselves may have to be massaged to remove references to the file(s) in question. If the filenames themselves are not unwanted, you can add them to .gitignore for good measure.

Step 5 merely preserves the dates of the original commits. Keep in mind that for this last step, your script will have to work in reverse chronological order as the history will be altered from that point forward.

  [0] https://rtyley.github.io/bfg-repo-cleaner/
  [1] http://eddmann.com/posts/changing-the-timestamp-of-a-previous-git-commit/

EDIT: Swap steps 1 and 2. Add advisement that patch files may require manual alteration. Add hint regarding .gitignore. Title case "Perl".