By accident I checked 60-odd full-size photographs into the git history of my website. I shrunk them in a next commit, but the history was still there leading to a bloated .git directory. This took a lot of time when making backups. This documents how I cleaned up this mess.
My first step was to research the issue. Most hits were for completely removing a file from history, which is not exactly what I wanted.
I found a couple of possible approaches for that:
- Using BFG Repo-Cleaner,
- git filter-branch
- git rebase -i
I didn’t want to install a special one-off tool that requires Java which I don’t use either.
And to be completely honest, I don’t really grok what filter-branch does.
And last but not least, doing a rebase could be done with the tools at hand and it looked fairly simple after reading this.
Just to be sure, I copied the whole repo to a test repo. This as an insurance in case I screwed something up. This is one of the things why I like git. Being able to test things on a throwaway copy of a repo and learn from them is invaluable.
Next I did an interactive rebase from the original silly commit.
> cp -Rp WWW test-WWW > cd test-WWW > git rebase -i 7b0d1232e301fe4b451d2d4ad596cb862d3df6a2
In the editor that sprang up during this command, I chose to squash the next commit (which corrected the issue) with the original commit. This worked fine. Note that the commits to my website have a linear structure; I don’t use branches on it.
After the rebase I ran the following commands;
> git reflog expire --expire=now --all > git gc --aggressive --prune=now
This had a huge effect.
> cd .. > du -csm WWW/ test-WWW/ 373 WWW/ 79 test-WWW/
It shrank the repo by close to 300 MiB. Mission accomplished!
Note that it is considered bad practice to use git rebase on commits that have already been pushed to a remote. Since I don’t keep this repo on a hosting site like github, that is not an issue here.