Distributed Version Control

Posted February 14, 2008

A few months ago, I started playing around with the ruby-wmii codebase, trying to get it to work with the newest version of wmii. I eventually succeeded, but that’s not what this post is about.

Mauricio Ferndandez, the awesome guy who originally created ruby-wmii, uses darcs for most of his projects, ruby-wmii included. Since I wanted to muck with ruby-wmii and send patches back, I had to give it a try as well.

Until that point, I’d never really used any revision control system other than Subversion. I’d checked out some stuff from CVS, but never made a commit to anything that used it. Not that it’s all that different from Subversion anyway.

I never know what I can assume people reading this are familiar with, so I’ll give a brief introduction to the difference between systems like darcs and those like Subversion and CVS. If you’re already familiar with this stuff, feel free to skip ahead three paragraphs.

What’s so Great About Darcs?

Darcs is what’s known as a “distributed” source-control system. Subversion and CVS are the opposite: “client-server” systems. In a very general sense, a distributed system is one in which each checked-out copy of the codebase has its own repository. By contrast, client-server systems have a single, central repository on a server. When a client checks out the repository, they get the code and some metadata, but not a full repository of their own.

A distributed model has several important consequences. It allows you to commit changes to your local repository, rather than a central one. This in turn makes commits faster, tends to make them more granular, and lets developers make commits without internet access.

Distributed models also have interesting political implications. There can’t be a central figure who decides what patches are and are not allowed in the repository. Any “official” repository is so because of consensus, not technical limitations. Developers can send patches around to each other without them ever touching a central repo.

The Competition

Coming from Subversion, I thought darcs was the bee’s knees. I used it for a little modification I did of Luke Redpath’s Pastie script1. Even for something only I was working on, I loved the ability to clean up my patches and so forth before sending them off to the server.

However, I’d heard about various other distributed source-control systems. As much as I liked darcs, it seemed a shame to only use it without trying out some alternatives. My first step in this direction was giving Git a try for my blog engine. Then, a couple weeks later, I switched another project2 to Mercurial.

As I accumulate more projects, I’ll probably check out other systems; Bazaar and SVK both look cool. But in the meantime, I think it would be interesting to lay out my feelings about the systems I have used.

Before I do so, though, I want to say that I haven’t used any of these systems for all that long. I certainly don’t know them very deeply. I’m sure some of the features in one that I miss in another actually exist, but I just haven’t found them yet. If that’s the case, please let me know. I’d love to hear how to make better use of these things.

Subversion

To be perfectly honest, all the distributes source-control systems I’ve used are better than Subversion.

Not that Subversion’s particularly bad. I’m pretty sure it’s about as good as you can get for the client-server model. When I was first learning it, a little over a year ago3, it seemed perfectly intuitive.

The thing is, the distributed systems provide more or less a superset of Subversion’s functionality. I’m not sure if this is strictly accurate, but everything I do with Subversion I can do just as easily with darcs, Mercurial, or Git.

On top of that, I get all sorts of goodies. Local commits, easy branching, metadata-preserving patch sharing, and so forth.

Darcs

Darcs is the most user-friendly of the three systems. It’s supposed to have some pretty painful performance issues with large repositories, and it doesn’t seem to have the sheer breadth of capabilities that Mercurial and Git have. But it’s fun, and has some very useful stuff that the other systems lack.

Sub-File Changes

Darcs is interesting in that it doesn’t view changes on a file-wide basis. In Mercurial or Git, if I commit changes to some file, then I commit all the changes in the file. In Darcs, though, by default it goes through each separate change within the file and asks me if I want to commit that particular change.

This is really cool behavior, and something that I miss in other systems. If I make some minor documentation change unrelated to something else I’m doing in the same file, darcs allows and encourages me to commit them as separate patches.

Git makes this possible, if a little annoying. git stash will stash away all un-committed changes and give you a clean working tree. You can then make the change, commit it, and run git stash apply to get your changes back.

As far as I can tell, the best way to do this in Mercurial is to create a new branch, fix the documentation there and commit to the branch, then merge that change back into the trunk. This is a lot more work.

amend-record

Another major winning feature for Darcs is the ability to edit old patches. You can use darcs amend-record to commit changes to any previous patch, rather than as a new patch. This is great if your old patch introduced a bug or had some formatting issues. Combined with sub-file changes, it’s incredibly useful.

Git gives similar power with its git rebase --interactive command. This is a little more unwieldy to use, but it’s not terrible. It allows you to edit, merge, and shuffle around commits that you’ve made.

Mercurial again doesn’t seem to have anything that provides a similar ability. The best it has is hg rollback which just flat-out undoes your last commit. You can then edit the working directory and re-commit, but you need to re-type the commit message and so forth.

Git

Git’s first priority seems to be power, rather than usability. This puts it in the interesting position of being harder to use than either of the other systems, but also offering many more capabilities than either darcs or Mercurial once you get the hang of it.

Git is designed with a Unixy toolbox philosophy. It provides a set of basic utilities designed to make it easy to build more advanced utilities. This contributes to the incredibly vast amount of things you can do with it.

Branching

Git has built-in handling of repository, which is very useful. You just say git branch name to create a new branch based on your current branch. Another parameter will allow you to base the branch on any branch, commit, or even remote repository.

Branches are all managed within the same repository. git checkout switches between them, actually updating the working directory with the contents of each branch.

In Darcs and Mercurial, as far as I can tell you need to manually clone the repository into a new directory if you want to create a branch. The old repository has no knowledge of this new one, which makes it a little tougher to share commits between them. All in all, it’s just annoying.

cherry-pick

git cherry-pick allows you to choose a single commit from another branch or repository and copy it into your current repository (merging if need be). It’s very useful if you have two branches that are similar, but not similar enough that you want to pull all the changes from one into the other.

Darcs offers similar functionality in that when you pull from a repository, you get to choose whether or not you want to pull each patch. This is annoying if there’s only one patch you want and lots to dig through, though. You can tell darcs only to pull patches matching a pattern, but that’s much more annoying than just specifying the commit.

In Mercurial, I think you need to manually export the patch from the source repo and import it into the destination repo.

format-patch and am

One thing you do a lot if you’re working on a reasonably popular open-source project (like, say, Haml) is recieve and apply patches from other people. Git makes this easy and painless.

git format-patch allows you to specify a remote repository, and then creates a nicely-named plain-text patch for each commit in your repo that’s not in the remote one. These patches have all the proper metadata: author name, commit message, etc.

You can then send these patches however you want (email, pastie, whatever) to the repository maintainer, who just pipes them into git am. This applies the patches and commits them, again with the proper metadata.

Mercurial supports something isomorphic to these commands with hg export and hg import.

Darcs does pretty well at this, too. darcs send will create similar metadata-rich patches and email them to people4. This is more similar to git email than git format-patch, but as long as you have sendmail set up properly it’s fine. If you don’t, it’s a bit of a hassle.

Mercurial

I’ve been kind of disappointed with Mercurial. It’s not as user-friendly as darcs, and it lacks some of the capabilities of Git. What’s more, I have yet to find anything I can do in Mercurial that I can’t do as easily in Git (once I know how).

If there are any Mercurial fans reading this, I’d love to know what you consider to be its killer features.

1 Available at http://darcs.nex-3.com/pastie.rb if you’re interested.

2 I’m not ready to release this one publically yet. It’s really cool, though – at least, I think so.

3 That seems so short… it feels like I’ve been doing this stuff forever.

Jean-Francois Couture said February 14, 2008:

I think using git-citool (it’s a gui tool for making commit) you can commit only part of the changes to a file, if I understand correctly what you mean by subfile changes.

Jon Leighton said February 14, 2008:

I have been playing around with bzr, and whilst it’s quite nice, I’ve been disappointed with its support for doing things similar to what Piston does for svn. I intend to check out the other distributed options, but in your experience have you found that type of thing to work very well with any of the competitors?

Nathan said February 14, 2008:

Jean-Francois: That does sound like what I mean, but I can’t figure out how to get git-citool to do that. Also, I tend to avoid Git’s GUI facilities because I find they tend to disrupt my workflow. citool does look pretty cool, though… I like the “amend last commit” option.

Jon: I haven’t played around with that in particular, but I think Giston is pretty much designed to solve that problem.

Jean-Francois Couture said February 14, 2008:

You can also use the git add -i (or—interactive) command if you prefer. The patch command will ask for each change if you want to add it to the commit. But that tool is far from intuitive. In citool, you select a file and then you can right click the part of the file to add that change.

I think giston just changed name to braid. I haven’t use it yet. Git also has something called submodule that should be the equivalent of svn:externals. I still need to try it though but it looks a little rough around the edge right now.

Nathan said February 14, 2008:

git add -i looks very promising. Thanks for that.

The thing about unintuitive tools is that intuition doesn’t matter once you become comfortable with them.

Jon Leighton said February 15, 2008:

Giston/Braid looks interesting, thanks guys. Ideally this sort of thing will one day be integrating into the SCMs themselves. Bzr seems to have some sort of plan for that, but it’s been around since 2005 so obviously not top priority…

Phil said May 02, 2008:

I use gitsum to do partial commits with Git from within Emacs. Apparently it’s inspired by darcsum. I’ve been very impressed with it; I rarely commit without it.

Nathan said May 02, 2008:

I’ve looked at gitsum, but it doesn’t seem to work out-of-the-box for me for some reason. I suspect it may be an incompatibility in Git versions. I plan to look into it at some point, but for now I just use git add -i.

Make your comments snazzy with Textile!