Skip to content

The BFG Repo Cleaner

Randy McDermott edited this page Aug 25, 2016 · 26 revisions

There are generally two reasons why you might need to "clean" your repo:

  • The repo has grown too large

  • Someone committed sensitive information

The Git functionality to handle these problems is the utility called git-filter-branch. However, unless you are a real expert with Git, git-filter-branch is pretty difficult to use. Thankfully, there is an amazing alternative called The BFG Repo-Cleaner by Roberto Tyley.

To use BFG you need to have command-line Java installed. Download and install JDK for your platform. To test your installation, open a terminal and type

$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

Once Java has successfully installed, move on to installing (really just downloading) The BFG Repo-Cleaner. In the link provided, the download button is at the upper-right. Move the bfg-1.12.13.jar (or whatever the current version is) file to where ever you keep your program files (e.g., /Applications on OSX). Next, it is convenient to create an alias to the run command in your .bash_profile or .bashrc. For example, add the following line for OSX:

alias bfg="java -jar /Applications/bfg-1.12.13.jar"

Of course, if you do not want to create an alias you can just substitute "java -jar <bfg.jar>" everywhere I have "bfg" below, where <bfg.jar> is the full path to the version of BFG you downloaded.

##Usage

The BFG is more of a hatchet than a scalpel. It is not possible, for example, to go in and clean a specific commit from the history of a repo. Below we show how to: (1) remove files, (2) remove folders, (3) remove blobs (files) larger than a certain size.

###IMPORTANT

First, backup your repo! Mistakes cannot be undone.

###Removing Specific Files

Suppose someone commits a password file called password.txt. To get rid of this file you need to do two things: First, you need to either revert the commit or git rm the file and commit this change to the repo. The reason for this step is that by default BFG leaves the current commit intact and only cleans the history of file. There is a way around this behavior, but it is recommended that if you do not want the file in the working tree that you explicitly remove it before cleaning the repo.

Step 1:

$ git rm password.txt
$ git commit -m "remove password file"

Step 2:

Now we run BFG on the repo to remove the file from the history. At the top level of the repo, do

$ bfg --delete-files password.txt
...

BFG will do a bunch of stuff and show output in your terminal. Finally, when you are finished it will tell you to do the following:

$ git reflog expire --expire=now --all && git gc --prune=now --aggressive

Your repo is now clean of the password file.

Note that you could remove different file types as well, e.g., *.png, *.pdf, etc. But you cannot give paths to the files. For example, /dir1/dir2/*.png will not tease out only the png files in the subdirectory. This is why I said BFG is not a scalpel. You can, however, get rid of certain directory names, which we will do next.

###Removing Folders

Suppose a repo has two subdirectories, dir1 and dir2, and you want to split this repo into two new smaller repos. First, copy the repo so that you have two identical repos. Of course, now you have used up twice the disk space. Next, you can go into each repo separately and clean out the subdirectory you no longer want. Just do the following:

$ cd repo1
$ bfg --delete-folders dir2
...
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Clone this wiki locally