Removing big files from git history

By accident I checked 60-odd full-size photographs into the git history of my website. I shrunk them in a next commit, but the history was still there leading to a bloated .git directory. This took a lot of time when making backups. This documents how I cleaned up this mess.

My first step was to research the issue. Most hits were for completely removing a file from history, which is not exactly what I wanted.

I found a couple of possible approaches for that:

Using BFG Repo-Cleaner,
git filter-branch
git rebase -i

I didn’t want to install a special one-off tool that requires Java which I don’t use either.

And to be completely honest, I don’t really grok what filter-branch does.

And last but not least, doing a rebase could be done with the tools at hand and it looked fairly simple after reading this.

Just to be sure, I copied the whole repo to a test repo. This as an insurance in case I screwed something up. This is one of the things why I like git. Being able to test things on a throwaway copy of a repo and learn from them is invaluable.

Next I did an interactive rebase from the original silly commit.

> cp -Rp WWW test-WWW
> cd test-WWW
> git rebase -i 7b0d1232e301fe4b451d2d4ad596cb862d3df6a2

In the editor that sprang up during this command, I chose to squash the next commit (which corrected the issue) with the original commit. This worked fine. Note that the commits to my website have a linear structure; I don’t use branches on it.

After the rebase I ran the following commands;

> git reflog expire --expire=now --all
> git gc --aggressive --prune=now

This had a huge effect.

> cd ..
> du -csm  WWW/ test-WWW/
373 WWW/
79  test-WWW/

It shrank the repo by close to 300 MiB. Mission accomplished!

Note that it is considered bad practice to use git rebase on commits that have already been pushed to a remote. Since I don’t keep this repo on a hosting site like github, that is not an issue here.

For comments, please send me an e-mail.

← PDF tricks Compiling asymptote for TeXLive on FreeBSD →

Roland's homepage

Removing big files from git history

Related articles