Imagine this situation: You're a software developer who's just been assigned a ticket. It's a bug, and the symptom of the problem is that your app is rendering triangles where it should be rendering quadrilaterals.
Seems straightforward enough, right? But wait - you are actually given more information. This is a regression, which means that the app used to correctly render quadrilaterals, and now it's rendering them as triangles. Whenever a software engineer sees a regression, the first thought should be "When was the first time this error exhibited itself?" If you ask the person that filed the bug, you'll probably get an answer like this:
Why is this question important, anyway? Well, if we can isolate the change where the regression first appeared, we know that the error was in the code that was added or removed when that change was applied.
git bisect is a tool you can use to run a binary search  on a set of commits to (hopefully) determine which one first exhibited the problem. It's a powerful tool, but there's a catch: you have to have a linear history of independently buildable, small, and working changes in order for it to work in the best manner possible .
git (which you probably already know or you wouldn't be reading this post) is a tool for controlling revisions of files. You edit files, then commit them to the repository after validating your changes.
git keeps track of the history of these commit objects, effectively building a graph.
In the ideal case, your commit history looks something like this:
Unfortunately, through bad commit/merge practices, you can sometimes get into a situation where this history is nonlinear:
One other important prerequisite is that the regression is reproducible. If you don't have a regression that's consistently reproducible, you can't test it at each stage.
git bisect relies on the fact that the regression can be tested for at each step in the process.
The bisect process
git bisect works like this: given an input range of commits, it splits the range in which the regression occurred into two sections using a commit near the center of the range. It checks out this pivot revision, and based on your input, it determines which of the two ranges on either side of the pivot should be checked next.
In the diagram above, you can see that we have a known "good" commit, a known "bad" commit (usually this is
HEAD, as the regression presumably exists at the current head of the
master branch, otherwise you probably wouldn't need to fix it ), and some unknown point where the regression was introduced.
There are really only four commands you need to know to use
git bisect good git bisect bad git bisect skip git bisect reset
Starting a Bisect
You can start a bisect session with
git bisect start :
scottj|jwir3/master:~/Source/SinkingMoon/website$ git bisect start scottj|jwir3/master|BISECTING:~/Source/SinkingMoon/website$
git bisect start wasn't on the list of commands I said you needed to know. That's because if you run one of the other commands,
git will offer to start the process for you:
scottj|master:~/Source/SinkingMoon/website$ git bisect bad You need to start by "git bisect start" Do you want me to do it for you [Y/n]? y scottj|master|BISECTING:~/Source/SinkingMoon/website$
Marking a Commit
git keeps track of the state of each commit. Each commit could be "good" (i.e. the regression is not present in this commit), "bad" (i.e. the regression is present in this commit), or "unknown" (i.e. there isn't enough information to determine whether the regression is present in this commit) To mark the current commit as "good", use
git bisect good. Alternatively, to mark it as "bad", use
git bisect bad. You can also mark a commit that isn't the current commit as "good" or "bad" by adding the commit hash to the end of the command, as in
git bisect bad <hash>.
Once you have marked both a "good" and "bad" commit,
git bisect will check out a commit near the middle of the range and prompt you to test this commit for the presence of the regression:
scottj|master:~/Source/SinkingMoon/website$ git bisect bad You need to start by "git bisect start" Do you want me to do it for you [Y/n]? y scottj|master|BISECTING:~/Source/SinkingMoon/website$ git bisect good c3a736 Bisecting: 12 revisions left to test after this (roughly 4 steps) [f2fd95cc92fa63a8a1a38a2ff2a8ccd406286849] Add handling for shapes that were previously considered non-primitive. scottj|(f2fd95c...)|BISECTING:~/Source/SinkingMoon/website$
As you can see,
git bisect tells you how many commits are in the range, and about how many manual tests you're going to have to perform to find the regression.
What you do now is check to see if the regression is in the currently checked-out commit, using whatever process you have available. If the regression is present in the currently checked-out commit, mark it as "bad" with
git bisect bad. If it's not, then mark it with
git bisect good.
There's a special situation that happens when you have a commit that you can't test - you have to skip it using
git bisect skip. You should refrain from overusing
git bisect skip, because if the actual regression happens to fall next to a commit that was skipped, you end up with a range of commits that could have caused the regression, and
git bisect won't be able to help you any further than that.
Going back to the output, we see that the next commit to be tested is
f2fd95, but this wasn't on our chart. How can this be the case? The reason this happens is that
git bisect will automatically look through the commits in merged branches:
As you keep repeating this process, eventually you will reach a point where all of the commits are either marked or skipped:
git bisect will report the first commit that contains the regression in a manner like so:
commit a73eed9a145bea5c05833f33eb26eec6 Author: Scott Johnson <firstname.lastname@example.org> Date: Sat May 20 18:44:29 2017 -0500 Subdivide all polygons into triangles using ear-clipping algorithm. Fixes #817.
Now, you can use
git diff a73eed..a73eed^ to see all of the changes in the commit and (hopefully) determine the source of the regression! 
Resetting a Bad Bisect
From time to time, you get into a situation where the bisect didn't work properly, or you screwed up along the way without realizing it. If this happens, use
git bisect reset. This will terminate the bisect process, and check out the last revision you had checked out before you started
If you accidentally mark a commit as "bad" when it should have been "good" (or vice versa), it's kind of wonky to get back to the state where you want to be. You can't just check out that commit and then set it to whatever you want (
git will report
Commit XXXXXXX is both good and bad). What you need to do is output the log to a file, change the one you need to change, and then replay the log back using
Suppose we accidentally marked commit
6138263837a802d9b32c9a5a5daf668378384108 as bad when it should have been good:
git bisect log > /tmp/bisect.log
The log will look something like this:
git bisect start # good: [f2fd95cc92fa63a8a1a38a2ff2a8ccd406286849] Add handling for shapes we previously considered non-primitive. git bisect good f2fd95cc92fa63a8a1a38a2ff2a8ccd406286849 # good: [3f4b51f80ac276e471809ed6127b0d893df01875] Re-enable texture filling of arbitrary polygons. git bisect good 3f4b51f80ac276e471809ed6127b0d893df01875 # bad: [0b72a4ac45df7b483b61a1799557ac952b74b398] Add the ability to rotate the camera 35 degrees off of the normal. git bisect bad 0b72a4ac45df7b483b61a1799557ac952b74b398 # bad: [7aca2e12f0e76b73a408d4d13aad52d7af1f4a84] Merge branch 'jwir3/#961-ear-clipping' git bisect bad 7aca2e12f0e76b73a408d4d13aad52d7af1f4a84 # bad: [a73eed9a145bea5c05833f33eb26eec6] Subdivide all polygons into triangles using ear-clipping algorithm. git bisect bad a73eed9a145bea5c05833f33eb26eec6 # bad: [6138263837a802d9b32c9a5a5daf668378384108] Add package for more general matrix math computations. git bisect bad 6138263837a802d9b32c9a5a5daf668378384108
Remove these last two lines (or, alternatively, change the last line to
git bisect good 6138263837a802d9b32c9a5a5daf668378384108, save the file, and then run
git bisect replay:
scottj|(6138263...) *|BISECTING:~/Source/SinkingMoon/website$ git bisect reset Previous HEAD position was 6138263... Add package for more general matrix math computations. Switched to branch 'master' scottj|master:~/Source/SinkingMoon/website$ git bisect replay /tmp/bisect.log We are not bisecting. Bisecting: a merge base must be tested [6138263837a802d9b32c9a5a5daf668378384108] Subdivide all polygons into triangles using ear-clipping algorithm. scottj|(6138263...) *|BISECTING:~/Source/SinkingMoon/website$
Automating Git Bisect
There is one other feature of
git bisect I find useful from time to time: the ability to automate the testing process, namely
git bisect run :
git bisect run /tmp/script.sh [optional arguments to the script]
Of course, you need to write this script, but it is useful if you can automate the testing process. In other words, if the regression is that the source code won't compile, the unit tests don't pass, or some other thing that's easily determinable by a script, you can have this script do the thing in question, and then return with an exit code of
0 if the test passed (i.e. no regression was present in this revision) or any number in the range
125, if the test failed (i.e. the regression was present in this revision). The special value
125 is only returned if the test can't be run (i.e. the equivalent of
git bisect skip).
git bisect is a tool that you can use to track down regressions in your code. Remember:
git bisectwill delve into commits made on separate branches and subsequently merged, provided the merge commits aren’t squashed into a single commit.
git bisectonly works at the commit level, so it’s more effective if you have smaller, bite-sized commits.
git bisectcan be used to find anything that was introduced in a specific commit - it doesn’t have to be a regression. You can use it to find, for example, the first time a feature was introduced.
- Even though
git bisectcan be used for many things (see aforementioned comment), it’s better to use some other method of tracking/searching (e.g. recording the features in every release using
git tag), if possible, as it’s likely faster than a manual binary search.
- While you can use
git bisect skipto skip a commit that can't be tested, it's better to avoid this as it can add ambiguity to your results. (As a corollary, try to only commit things that don’t have general, glaring errors and that compile and run at the most basic level).
You can learn more about
git bisect (it has many other cool things, like the ability to change the terms for a bisect session from "good" and "bad" to anything you want - like present and missing, for feature determination) from the git bisect documentation page. It's actually very readable documentation, and I highly recommend giving it at least a cursory glance.
My biggest recommendation in learning how to use
git bisect effectively, though, is to do it. When you've identified a problem that you think can be solved with
git bisect, try to solve it using that tool! This will teach you much about the tool. If you have any questions, I'm always open to answering them in the comments below, on slack/IRC as
jwir3, or via email at
References and Notes
Geelnard, Marcus. (2015). A tidy, linear git history. Retrieved 30 January 2018 from http://www.bitsnbites.eu/a-tidy-linear-git-history/. ↩︎
If you want to look to see if your git history is linear, try using the command
git log --graph --oneline --abbrev=6. This will show you a graph of your git commits, and you can look to see if it's a linear graph (looks like the first image) or a non-linear graph (looks like the second image). ↩︎
You should, of course, try to minimize the number of steps in the bisect process, so if you know that a commit earlier than
HEADis bad, use that one instead, as it will save you time in the bisect process. That said, it's a trade-off between this and manually attempting to find a commit closer to the actual regression, because the closer you get to the regression, the more time
git bisectwill spend on the other side of the binary search. Thus, about 85%-90% of the time, I just use
HEADas my "known bad" commit. ↩︎
I'm using git-prompt with zsh on Mac OSX for all of the following examples, which is why you'll see the steps remaining when I run
git rebase. Your mileage may vary if you're using a different shell or OS. ↩︎
Often, this happens because of a compile-time error that was added as a non-final commit in a branch prior to merging. This is annoying, because, unless you can fix it trivially and without biasing the binary search (i.e. accidentally fixing the regression or making it worse), you won't be able to do anything with this commit other than skip it. This is part of the reason why each commit should be independently compileable and runnable for each commit, not just for the last commit on a branch (unless you're using squash merges). See also note 2, above. ↩︎
Depending on the length of the diff, you might be able to see the problem immediately. This is more likely if you have small, independent, and straightforward commits. I tend to abide by the phrase "Commit early and often". This is also the reason I don't use squash merges, since they combine a bunch of small commits into a single, monolithic commit, and you lose all of this incremental work (and the ability to find the regression in this incremental work). ↩︎
Subscribe to The Nothingness of Scott
Get the latest posts delivered right to your inbox