Why you should keep a clean git history

Saturday, November 14, 2020

There are still many software projects that don’t care about keeping a clean git history. I think that is a pity. Here I show you a couple of reasons explaining why it is beneficial to keep a clean git history.

What is a clean git history ¶

First, let me define what I understand under a clean git history:

Each commit compiles and passes all tests
Each commit is a self-contained logical unit. It changes a single thing.
Each commit should include a rationale that explains why the change is made unless it is clear without it.

The second point involves a bit of judgment. It doesn’t mean “a single commit should only fix a single typo,” but rather, it is about concerns. A commit should be dedicated to typo fixes, dedicated to copy editing, or dedicated to a refactoring of one component. It shouldn’t mix concerns. It shouldn’t fix a bug and add a feature. It shouldn’t fix a bug, and bump the version, and overhaul the README.

It serves as your long term memory ¶

No matter how great of a memory you have, you will forget why you made a change. If you work long enough on a project, you’ll even forget that you made a change or wrote a piece of code at all.

Adding good descriptions in the commit messages is a great way to persist your contexts while you make a change. To get the most value out of the effort, you should explain why you make a particular change, not what you’re changing. What you’re changing is already visible in the code diff.

There are two additional benefits to this, both increasingly important in remote work or asynchronous environment:

Reviewers will have an easier time understanding your changes, avoiding unnecessary back and forth clarifications.
In the future, other people will be able to look up the descriptions. So they don’t have to bother you with questions about why you made a specific change. The information will also remain long after you’ve left a project or the company.

Makes reverting changes easier ¶

People make mistakes, and even with a code review process in place, your team may merge a change, only to discover later that the change broke something. Using git revert is a lot easier than having to revert a change manually. This is one of the reasons why it is essential to create separate commits for individual logical changes. It gives you a lot more flexibility when reverting changes.

Makes backporting changes easier ¶

Another reason to create separate commits for individual logical changes is that they’re a lot easier to backport if you’re keeping branches for older stable versions of your software. You may want to backport a fix, but not a more significant refactoring. Having separate commits makes this a lot easier as you can use the git cherry-pick functionality instead of manually extracting a fix from a huge diff.

Makes it possible to use bisect to find regressions ¶

Every once in a while you may find a regression. A feature worked in an earlier version but at some point stopped. You don’t know which change broke the feature.

Git offers the git bisect functionality. If you have a test case that triggers the regression, it can automatically determine the commit that first broke the functionality. Even if you don’t have an automated test case it greatly simplifies the search for the culprit.

For this to work well it is important that each commit compiles and passes tests. And once you’ve found the bad commit, you probably prefer it to be small rather than big.

For bugs it may often be easy enough to fix the issue without identifying the commit that introduced the regression, but another use case of git bisect is to find a commit that introduced a performance regression. With performance regressions it is often much harder to identity the cause just by reasoning about the code.

Wrap up ¶

To wrap this up, let me make one more point: There isn’t a single good reason not to keep a clean git history.

If you think it requires more effort, you’re wrong and being lazy. Once you make splitting up your commits into logical units and writing commit messages a habit, it’ll be faster than changing several knobs at once and then mashing them into one huge giant commit pile.