Beautifying commits with git

When you look at our Subversion log, you’d often see revisions containing multiple topics, which is something I don’t particularly like. The main problem is merging patches. The moment you cramp up multiple things into a single commit, you are doomed should you ever decide to merge one of the things into another branch.

Because it’s such an easy thing to do, I began committing really, really often in git, but whenever I was writing the changes back to subversion, I’ve used merge –squash as to not clutter the main revision history with abandoned attempts at fixing a problem or implementing a feature.

So in essence, this meant that by using git, I was working against my usual goals: The actual commits to SVN were larger than before which is the exact opposite thing of how I’d want the repository to look.

I’ve lived with that, until I learned about the interactive mode of git add.

Beginners with git (at least those coming from subversion and friends) always struggle to really get the concept of the index and usually just git commit –a when committing changes.

This does exactly the same thing as a svn commit would do: It takes all changes you made to your working copy and commits them to the repository. This also means that the smallest unit of change you can track is the state of the working copy itself.

To do finer grained commits, you can git add a file and commit that, which is the same as svn status followed by some grep and awk magic.

But even a file is too large a unit for a commit if you ask me. When you implement feature X, it’s possible if not very probable, that you fix bugs a and b and extend the interface I to make the feature Y work – a feature on which X depends.

Bugfixes, interface changes, subfeatures. A git commit –a will mash them all together. A git add per file will mash some of them together. Unless you are really really careful and cleanly only do one thing at a time, but in the end that’s now how reality works.

It may very well be that you discover bug b after having written a good amount of code for feature Y and that both Y and b are in the same file. Now you have to either back out b again, commit Y and reapply b or you just commit Y and b in one go, making it very hard to later just merge b into a maintenance branch because you’d also get Y which you would not want to.

But backing out already written code to make a commit? This is not a productive workflow. I could never make myself do something like that, let alone my coworkers. Aside of that, it’s yet another cause to create errors.

This is where the git index shines. Git tracks content. The index is a stage area where you store content you whish to later commit to the repository. Content isn’t bound to a file. It’s just content. By help of the index, you can incrementally collect single changes in different files, assemble them to a complete package and commit that to the repository.

As the index is tracking content and not files, you can add parts of files to it. This solves the problems outlined above.

So once I have completed Feature X, and assuming I could do it in one quick go, then I run git add with the –i argument. Now I see a list of changed files in my working copy. Using the patch-command, I can decide, hunk per hunk, whether it should be included in the index or not. Once I’m done, I exit the tool using 7. Then I run git commit1) to commit all the changes I’ve put into the index.

Remember: This is not done per file, but per line in the file. This way I can separate all the changes in my working copy, bug a and b, feature Y and X into single commits and commit them separately.

With a clean history like that, I can consequently merge the feature branch without —squash, thus keeping the history when dcommiting to subversion, finally producing something that can easily be merged around and tracked.

This is yet another feature of git that, after you get used to it, makes this VCS shine even more than everything else I’ve seen so far.

Git is fun indeed.

1) and not git commit -a which would destroy all the fine-grained plucking of lines you just did – trust me: I know. Now.

Impressed by git

The company I’m working with is a Subversion shop. It has been for a long time – since fall of 2004 actually where I finally decided that the time for CVS is over and that I was going to move to subversion. As I was the only developer back then and as the whole infrastructure mainly consisted of CVS and ViewVC (cvsweb back then), this move was an easy one.

Now, we are a team of three developers, heavy trac users and truly dependant on Subversion which is – mainly due to the amount of infrastructure that we built around it – not going away anytime soon.

But none the less: We (mainly I) were feeling the shortcomings of subversion:

  • Branching is not something you do easily. I tried working with branches before, but merging them really hurt, thus making it somewhat prohibitive to branch often.
  • Sometimes, half-finished stuff ends up in the repository. This is unavoidable considering the option of having a bucket load of uncommitted changes in the working copy.
  • Code review is difficult as actually trying out patches is a real pain to do due to the process of sending, applying and reverting patches being a manual kind of work.
  • A pet-peeve of mine though is untested, experimental features developed out of sheer interest. Stuff like that lies in the working copy, waiting to be reviewed or even just having its real-life use discussed. Sooner or later, a needed change must go in and you have the two options of either sneaking in the change (bad), manually diffing out the change (hard to do sometimes) or just forget it and svn revert it (a real shame).

Ever since the Linux kernel first began using Bitkeeper to track development, I knew that there is no technical reason for these problems. I knew that a solution for all this existed and that I just wasn’t ready to try it.

Last weekend, I finally had a look at the different distributed revision control systems out there. Due to the insane amount of infrastructure built around Subversion and not to scare off my team members, I wanted something that integrated into subversion, using that repository as the official place where official code ends up while still giving us the freedom to fix all the problems listed above.

I had a closer look at both Mercurial and git, though in the end, the nicely working SVN integration of git was what made me have a closer look at that.

Contrary to what everyone is saying, I have no problem with the interface of the tool – once you learn the terminology of stuff, it’s quite easy to get used to the system. So far, I did a lot of testing with both live repositories and test repositories – everything working out very nicely. I’ve already seen the impressive branch merging abilities of git (to think that in subversion you actually have to a) find out at which revision a branch was created and to b) remember every patch you cherry-picked…. crazy) and I’m getting into the details more and more.

On our trac installation, I’ve written a tutorial on how we could use git in conjunction with the central Subversion server which allowed me to learn quite a lot about how git works and what it can do for us.

So for me it’s git-all-the-way now and I’m already looking forward to being able to create many little branches containing many little experimental features.

If you have the time and you are interested in gaining many unexpected freedoms in matters of source code management, you too should have a look at git. Also consider that on the side of the subversion backend, no change is needed at all, meaning that even if you are forced to use subversion, you can privately use git to help you manage your work. Nobody would ever have to know.

Very, very nice.