February 23, 2021 • Reading time: 12 minutes
At this point, most developers use Git as a tool for collaboration. We have our rote-learned commands to pull, commit, and push. And of course, there's that one coworker who knows a bit more about Git than everyone else, who helps get us back on track whenever our local repos end up in a strange state.
But what if I told you that Git can be a valuable tool without ever setting up a remote repository? I'm not just talking about having a working version of your code base to roll back to if you mess something up, although there's that too. Used correctly, Git can help to structure your work, identifying gaps in your test coverage and minimizing dead code.
There are two subjects I'm going to avoid for the purposes of this blog post:
other developers, who are the most compelling but least interesting argument
for keeping your commit history clean, and git bisect
, which does factor
heavily into my workflow but deserves its own blog post.
As with any ubiquitous developer tool, the Git user base has a lot of strong and conflicting opinions about the one "correct" way to use it. My goal is simply to introduce a workflow that I've been using and refining for much of my career; take from it what you will. And, importantly, it's a workflow that has become a vital part not just of my collaboration process, but of the way I write code.
Ultimately, these principles serve two purposes: they focus my work onto a particular bugfix, feature, or goal, and they ensure that my Git history isn't set in stone. With proper hygiene, commits can be dropped, rearranged, and split off into other branches painlessly and without merge conflicts.
When I'm managing my own projects, I have a lot of ideas that I want to see
happen. If I'm just throwing one commit after another into main
, I'll get
halfway through implementing one feature and then jump off to hacking on
another. If any of the features get completed, it will be at the expense of a
wasteland of half-completed features that are now taking up space in my code
base.
In a brand-new project, sure, I'll throw a bunch of garbage commits into main
.
My rule of thumb for when to stop this is when I can write my first effective
integration test. If there is something useful to test, there is now enough
substance to my project that I can have distinct tasks on the go. Trying to
break into branches too early just results in me throwing my garbage commits
into a branch instead of main.
In the early stages of a project, articulating the purpose of a branch can be as simple as giving it a descriptive name. If a commit isn't moving the code base in that direction, it can always get cherry-picked into a different branch.
As the project matures, I'll start using some sort of issue or bug tracking software to flesh out what I'm trying to accomplish in more detail and coordinate the branches for multiple related useful things.
I find that descriptive branch names also help to refocus my attention on what I'm trying to accomplish. For instance, my command prompt currently looks like this:
10:02:19 max ~/Projects/mikkel.ca blog-post-git-as-a-solo-developer| R%
So much for branches, let's zoom into a commit level. I've articulated what concrete thing I want my branch to add, now how do I add it? Usually, there's some poking around my code base involved in figuring that out. Sometimes I take a wrong turn, sometimes I just get distracted. That's okay, it's part of the process.
However, that doesn't mean that every commit I make right now is going to end up getting merged in this branch. By keeping my commits independent from one another, I ensure that I can rearrange or cherry-pick them into new branches if I discover that they really don't have anything to do with what I'm working on right now.
If my commits are not independent, I am essentially stuck with the exact history as it was written. Trying to tease out a commit into a different branch or move it to the beginning of my branch history will become fraught with merge conflicts as later commits that modified code introduced in this commit fall like dominoes.
Obviously, I'm still allowed to call code written in one commit from a later commit. That's the reason I'm doing this particular work in this particular branch, after all. But I never touch the same code multiple times. If I have to go back and fix something, maybe add a validation check or field that I hadn't thought of, I'll go back to the commit where it was created rather than amending it in a later commit.
Obviously, this could go on forever, which is why the "one useful thing" principle exists. Once I've settled on what I want the code to look like for the purposes of this branch, I merge and then start a new commit in the next branch for further changes to the same.
Here's where keeping commits small starts to pay dividends. If the code in each commit is small enough for me to reason about, it's small enough for me to visually ensure that its test coverage is good.
And of course, if I do end up rearranging this commit or splitting it off to a different branch, I want its tests to come along with.
The exception to this is integration and functional/behavioural tests, which can and should have their own commits. In that case, the tests are really tied to the branch level rather than the commit level, since Principle 1 implies that there should be exactly one new test to add as a result of this branch.
Again, breaking something in a commit (even if I really definitely intend to fix it in a later commit) locks me into the git history as written. And introducing a breaking change with the intention of fixing things later always carries the risk that I'll get distracted and end up merging the breaking change.
If there's some prerequisite to get this change to pass tests - say, a preexisting bug that snuck through a hole in my test coverage - that gets its own commit.
Speaking of holes in test coverage, there's another (temporary) exception here. I don't normally practice strict test-driven development, but if I do fix a long-standing bug, I normally temporarily put its test in a separate commit. I'll then rebase so that the test appears before the fix, ensure that the test fails without the fix, then complete the rebase and validate that the test now passes. Once the due diligence to validate my test is done, I can go ahead and squash the bugfix with its test.
If I know that I'll be coming back to a change later, I'm much more comfortable setting it down and moving on to roughing in the next part of the process, rather than finishing, polishing, and unit testing code that might need to change before my branch gets merged.
In fact, I find that I waste much less time on writing tests for things that I'll later change when I'm following this workflow to the letter than I do when I get "lazy" and start dumping everything into big catch-all commits.
Some people favour TODO comments in their code, occasionally supported by automated checks that prevent code containing "TODO" from merging. I prefer to annotate my commit messages and leave my code clean. Normally, this looks something like "add controller class - TODO test me". (I always put my TODOs on the first line of the commit message, so that they show up even in short log views.)
Often I start a task by tidying up the surrounding code, in the same way I might organize my desk before starting work. (I don't, but I might.) Sometimes that cleanup turns out to be a valuable part of the groundwork for this change, but sometimes it's just dead weight. Keeping my commits independent makes it easy to discard or cherry-pick out code that turned out to be unnecessary, along with any unit tests that went along with.
(I do still consider the tidying to be a valuable part of the process. It clears my mind and refreshes my knowledge of the problem space with some simple rote tasks before I dive into something more complex. And occasionally it results in cleaner code.)
I'm not perfect.[citation needed] Obviously, it's not practical to maintain this level of commit hygiene by making each change sequentially. Instead, I jump around constantly. Doing so requires me to be comfortable in navigating my commit history. (Conversely, it's also a good way to become comfortable with navigating history.)
In that vein, here are some tools beyond your standard
checkout
/branch
/pull
/commit
/push
workflow that come in handy.
git commit --amend
– A quick and easy
way to update the most recent commit.
git commit --fixup [hash]
– When
changing history, I used to find myself making a lot of commits with messages
like "merge me with xyz" if I need to revisit commits before the most recent
one. It turns out that git commit
has flags to help with this: --fixup
and
--squash
will automatically suggest a fixup or squash with another commit
during rebase if the --autosquash
flag is provided to that command. (To
enable this behaviour by default, run git config --global rebase.autosquash true
. It won't behave any differently if there are no commit messages in the
history being edited that contain "squash!" or "fixup!".) A surprise bonus:
since the fixup operation inherits the message of the previous commit, you
won't be prompted to enter a new one.
git rebase --interactive main
– I can
also use git rebase --interactive HEAD~5
to edit the last 5 commits, but I
find rebasing directly on main
(or master
, or whatever my upstream branch
is) kills two birds with one stone. It will show me all commits since I
branched off from main
, and will simultaneously bring my branch up to date
with my latest local copy of main.
git stash
– Sometimes I have unrelated
changes on the go that I don't want to commit right now. git stash
is an
easy way to make them go away, and git stash pop
brings them back again.
Just use it sparingly, because finding your changes in the stash later is a
pain. If I'm not planning to pop it back off again in the near future, I make
a progress commit instead.
git blame
– Okay, this is more
valuable in collaboration. When it's my code base, I already know whose fault
it is. Still, despite the name, I use git blame
not to find out who to
blame, but to find out why something was done. That applies equally whether
it was done by another developer or me six months ago. Most commonly, I'll use
it when I see something that looks like a bug, and I want to find out: a)
what purpose the thing was supposed to serve, b) if it was successful in
serving that purpose, and c) whether or not any related code is still in the
code base. It's the task of finding related code that really puts your
project's commit hygiene to the test.
The usual Git/Vim disclaimer applies to my list: if you get five power users in a room and ask them to do a complicated task, they'll get it done quickly, efficiently, and in seven different ways. The commands I use are not the only ways to accomplish the same results, and are probably not the best way.
While I think my workflow stands on its own as a way of structuring your
thoughts and ensuring that your test coverage is good, this is of course also a
workflow that will get you a lot of love from coworkers or collaborators.
Well-crafted pull requests are a joy less of a misery to review, and
referring back to a well-written commit turned up in git blame
makes it much
easier to understand what you or another developer was trying to accomplish with
a change (and evaluate whether or not it was successful).
And of course, there's git bisect
. The
short version, for those who haven't used it, is that git bisect
allows you to
find when something changed, across all of history, either manually or using
automated tests, all in O(log n) time. Ensuring that your commits always pass
tests make them friendly to git bisect
, and ensuring that they are as small as
possible means that when bisect tells you which commit introduced a bug, there
is very little code in which that bug could appear.