Using Git: review and analysis

A version control system (VCS) performs two major functions:

  1. It saves snapshots of your project for comparison and debugging purposes.
  2. It publishes your project for use by others.

Early VCS like RCS performed did only (1). As sharing code over a network became more common, systems like CVS and Subversion were developed, which performed both (1) and (2).

Tragically, CVS and Subversion use the same command ('commit') to perform both operations. And that means a user who is unable to perform (2) for whatever reason (say, he's on an airplane, or has no commit privileges) loses out on all the advantages of (1) as well. And a user can't do (1) unless he is also willing to do (2).

This is where distributed version control systems (DVCS) like Git come in.

In Git, (1) and (2) are decoupled. While you're working, you can snapshot your project as often as you want. But you do this without publishing your work. If and when you do decide to publish, the complete change history is transplanted to a public repository. Others can see the individual changes you've made and understand your development process. If you don't have permission to commit to the original repository, someone who does can commit on your behalf after reviewing your work. But if your work didn't pan out, you can blow it away and no one else is the wiser.

For a project, the chief advantage of using a DVCS is that it allows many contributors to work asynchronously, so that everyone who wants to can get all the usual version control tools, without the blessing of the managers and without any centralized coordination needed. Use of a DVCS dramatically lowers the barrier for contributors.

Now, it's not like I spend most of my time working on the Linux kernel. But what I've realized is that a DVCS changes the game even for single-user projects. Git's fast branching operations encourage users to proceed down experimental avenues. Whenever you work on two things at once, branches can help you to keep them separated. Rule of thumb: anytime you implement anything even moderately complex, do it on a new branch. This has the following advantages:

  • You can delete the branch if you can't get it to work.
  • You can pause work and continue working on the original branch if something urgent or unrelated comes up.

If you use branches for features in development, and only merge them back into your master (mainline) branch when you're finished, then you know that master is never in a half-working state. If work continues on master, you can transplant ('merge') those changes into your experimental branch. Your experimental branch can have the latest updates from master, but the master branch itself is never tainted by experimental code. Branching liberally removes a lot of the uncertainly associated with changing things.

Because Git provides a superset of the features of CVS, you can use Git in a CVS-like way, if you want to. But because it's so lightweight (easy to configure; no need to set up a server), low-overhead, and fast (especially in handling branches), I've found myself using Git to manage content that I would never have bothered to configure CVS for. Say goodbye to files named thesis-backup, thesis-backup2, etc.

Nowadays, I use Git even for personal projects that don't contain source code— anywhere I want to keep content synchronized between computers. Here I'm merely using Git as a file synchronization and backup tool. Git does conflict resolution when it's necessary (I don't need it frequently, but it happens). Every clone is itself a bona fide respository from which I can make another clone. And if I clone B from A, and then in turn clone C from B, that C and A have enough information to synchronize with each other directly. In these respects a DVCS is much more robust than ordinary file synchronization software. To top it all off, every working copy knows the full history of the project, and acquiring updates is as simple as git pull. There is overhead associated with storing all past history, but modern DVCS are good at keeping it small, and hard disk space being as cheap as it is, it's a small price to pay for easy and reliable backups.

Git is billed not as a VCS per se but as a "content tracker". Depending on how you use it, it's a local VCS, a VCS for sharing, a file synchronizer, or a time-travel backup system. Not only is it convenient that Git can fill all those needs, it's reassuring to know that as my projects change and grow, it is very unlikely that they will outgrow Git.

Further reading: Git homepage, Git tutorial

5 comments:

  1. "git tutorial" at the bottom of your post link is broken

    ReplyDelete
  2. Thanks, I've fixed that link.

    ReplyDelete
  3. Thanks for the good and short introduction

    ReplyDelete
  4. The GIT tutorial link is broken

    ReplyDelete
  5. Thanks, I've fixed that link... again.

    ReplyDelete