Removing big files from git history
By accident I checked 60-odd full-size photographs into the git history of my
website. I shrunk them in a next commit, but the history was still there
leading to a bloated .git
directory. This took a lot of time when making
backups. This documents how I cleaned up this mess.
My first step was to research the issue. Most hits were for completely removing a file from history, which is not exactly what I wanted.
I found a couple of possible approaches for that:
- Using BFG Repo-Cleaner,
git filter-branch
git rebase -i
I didn’t want to install a special one-off tool that requires Java which I don’t use either.
And to be completely honest, I don’t really grok what filter-branch
does.
And last but not least, doing a rebase could be done with the tools at hand and it looked fairly simple after reading this.
Just to be sure, I copied the whole repo to a test repo. This as an insurance in case I screwed something up. This is one of the things why I like git. Being able to test things on a throwaway copy of a repo and learn from them is invaluable.
Next I did an interactive rebase from the original silly commit.
> cp -Rp WWW test-WWW
> cd test-WWW
> git rebase -i 7b0d1232e301fe4b451d2d4ad596cb862d3df6a2
In the editor that sprang up during this command, I chose to squash the next commit (which corrected the issue) with the original commit. This worked fine. Note that the commits to my website have a linear structure; I don’t use branches on it.
After the rebase I ran the following commands;
> git reflog expire --expire=now --all
> git gc --aggressive --prune=now
This had a huge effect.
> cd ..
> du -csm WWW/ test-WWW/
373 WWW/
79 test-WWW/
It shrank the repo by close to 300 MiB. Mission accomplished!
Note that it is considered bad practice to use git rebase
on commits that
have already been pushed to a remote. Since I don’t keep this repo on
a hosting site like github, that is not an issue here.
For comments, please send me an e-mail.
Related articles
- Merging local git repositories
- Converting RCS history to git
- Making a subset of a git repository
- Moving scripts to a private repo
- Keyword expansion with git