2007

for ... else in Python

Python has an interesting for statement (reference) which lets you specify an else suite.

In a construct like this one:

for i in foo:
  if bar(i):
    break
else:
  baz()

the else suite is executed after the for, but only if the for terminates normally (not by a break).

Here's some code written without for...else:

def contains_even_number(l):
  "Prints whether or not the list l contains an even number."
  has_even_number = False
  for elt in l:
    if elt % 2 == 0:
      has_even_number = True
      break
  if has_even_number:
    print "list contains an even number"
  else:
    print "list does not contain an even number"

The equivalent code snippet below illustrates how the use of for...else lets you remove an extraneous flag variable from that loop:

def contains_even_number(l):
  "Prints whether or not the list l contains an even number."
  for elt in l:
    if elt % 2 == 0:
      print "list contains an even number"
      break
  else:
    print "list does not contain an even number"

Use your good judgment when deciding whether to use the for...else construct. It's not unequivocally better, but when there's an asymmetry between the two possibilities, you can make your code more readable by using for...else to keep the "happy path" logic at the top and the exceptional/error case at the bottom.

Version control with Git: git-rebase --interactive

Git 1.5.3 introduces git-rebase --interactive, which lets you alter the commit history in various ways, including splitting, squashing (combining), inserting, and removing patches. In each case git-rebase rewrites the subsequent commit history so no one else is the wiser.

Start by doing:

git-rebase -i f00bab

where f00bab is the commit before the first commit you want to change.

Git opens an editor describing the commits since that commit, in chronological order, in the following format:

pick 1e4dfd7 Foo bar commit. pick 6b78037 Baz quuz commit. :

You can edit this list to tell Git to do certain things:

Remove a line to delete the corresponding commit.
Move lines around to reorder commits.
Change pick to squash on a line to combine that commit with the previous commit.
Change pick to edit to modify or split that commit (see below).
Git will attempt to reapply all other commits (lines which are still labeled pick).

When you squash a commit, Git prompts you for a new message for the combined commit.

When you choose edit, Git applies that commit but pauses the rebasing process so you can edit your tree. There are a couple of ways to proceed:

To edit the commit message only, just do git commit --amend.
To edit the commit itself, make some changes, git add them, and do git commit --amend.
To split the commit into multiple commits, do git reset HEAD^ to rewind the branch without touching the working copy. Stage and commit (git add ...; git commit) your first change. Repeat that as many times as you like until the branch catches up to your working copy. git-add --interactive can be useful if you want to pick and choose parts of files to stage. Notice that if you only use git add and git add --interactive then your working tree never changes. If you really want to work with the intermediate states— for example, to run unit tests or whatnot— use git-stash to put away your working copy temporarily.
You can also make additional commits here, which will appear right after the commit being edited.

In any case, when you're finished at that point in history, use git-rebase --continue to move on.

Whenever you change a commit, Git applies the patches from subsequent commits (at least to the best of its ability; if a change you make causes a subsequent patch to not apply cleanly, then Git will stop to ask you to resolve the conflict). However, these new commits will have different identifiers.

Further reading: git-rebase documentation

Version control with Git: git-add --interactive

Git 1.5.0 introduces git-add --interactive, which lets you stage changes at a finer granularity than the file level. This is useful if you've made a number of changes in a file before you realize that they logically should go in as separate commits. It works by allowing you to pick and choose hunks of the diff to stage.

To start:

$ git add --interactive
           staged     unstaged path
  1:    unchanged       +26/-6 path/to/file1
  2:    unchanged        +1/-0 some/other/file

*** Commands ***
  1: status   2: update   3: revert   4: add untracked
  5: patch    6: diff     7: quit     8: help
What now>

git shows you all the files which have changes from HEAD. To pick and choose diffs, select 5 (patch) at the menu. Git prompts you to select one of the listed files by number.

Git then shows you hunks of the diff between HEAD and your working copy. For each hunk, you can choose whether or not you want to stage the hunk. For large hunks, Git will also offer to split the hunk into smaller hunks which you can stage independently. (There are other options, such as options to stage or unstage all remaining hunks.)

Once you're done, you can select 7 to quit, do git diff --cached to verify the staged changes, and do git commit as usual to commit.

Update: you can also use git-add --patch FILENAME, which skips the menu and jumps directly to the hunk selection part. I usually use this instead of git-add --interactive now and I've aliased it to gap in my shell. If you use Emacs, magit is a great extension that also supports staging and unstaging individual hunks.

Version control with Git: branches

In git, branches are a lightweight way to manage multiple lines of development. You might use a separate branch for work on an experimental feature, or to maintain parallel lines of code with just a few differences. In either case, it's often essential to copy changes ("merge") from one branch to another to keep them synchronized, and git handles this very well.

To create a new branch named mynewfeature and switch to it:

git branch mynewfeature
git checkout mynewfeature

Now, commits you make will appear on the mynewfeature branch, but will not affect the default branch (which is named master). Running gitk --all conveniently shows you commits and branches, and lets you see graphically when your branches diverged or were synchronized.

To resume working on the master branch:

git checkout master

Git will put away all the file versions associated with your branch and restore the file versions associated with the master.

Typically, after you branch, new changes will be committed both on the branch and on the master (main line). Merging changes on the master into your branch regularly will ensure that your branch contains the latest updates from the main line (but does not apply to the master any changes you've made on the branch). Assuming you have checked out a branch, you can merge changes from the master with:

git merge master

Git applies all the changes made on the master since your branch diverged from it (or since the last time you merged). It then (usually) creates a new commit representing the merged state. If there is a merge conflict, Git will not make the commit: you will be asked to fix up the merge and then git commit -a the result.

Once you are done with your new feature, it's time to merge it back into the master! Checkout the master and then merge your branch like so:

git merge mynewfeature

You can then delete the branch:

git branch -d mynewfeature

Keyboard navigation in Google search results

Google Labs has an experimental version of Google search that lets you use GMail-like keyboard shortcuts to navigate through Google search results!

Like in GMail, an arrow is displayed next to the search results to indicate the currently selected one. j and k go to the next and previous results, respectively, o opens a result, and / puts you back in the search box.

To activate keyboard shortcuts for your Google account, go to Google Experimental Search and select the Keyboard shortcuts experiment.

Version control with Git: remote repositories

Previously, I wrote about Git usage for single-user single-location projects. However, where Git really shines is in managing a project when changes are made on multiple machines (whether by one person or by multiple people).

Unlike centralized version control systems and file synchronization software, Git and other distributed version control systems actually have good support for disconnected operation:

You can perform your commits locally and without talking to a central server. You can push all your changes to another location whenever you get a chance. (CVS and SVN don't support making commits in isolation, so people who work offline end up submitting huge patches.)
When you (or you and other people) perform independent changes in parallel on different machines, Git knows how to gracefully merge those changes the next time you synchronize.

However, I've found Git to be a lot easier to get started with than other VCS's with the same features. All you need is a git init to start managing a simple project in Git: like RCS (but unlike CVS and SVN), there's no need to create a separate repository and make a checkout of it. And yet, when your project matures, Git will happily (and with just a couple of additional commands) move that project to multiple machines or share it over the internet so you can take advantage of its distributed features.

One of my projects is used solely to manage my dotfiles. (I'm constantly tweaking my .emacs, .bashrc, etc.) I use Git to keep my dotfiles synchronized on all the computers I use. I'll discuss the basics of distributed operation for a generic project before talking about some of the wrinkles associated with using Git to manage your dotfiles.

Suppose my Git repository is in ~/testproj on the machine bigphil. To start working on that project on another machine (the equivalent of "svn checkout"), do:

git clone ssh://bigphil/~/testproj

You can also clone a repository from elsewhere on a local disk, or over HTTP:

git clone ~/testproj git clone http://example.com/git/testproj

You now have the complete version history of the project, and Git can work completely independently of the original repository, if you'd like. You can add files, make your own commits, etc. on your new repository locally.

However, typically you will want to continue incorporating commits that are made to the source repository. You can do this with:

git pull

(When you cloned the repository, Git remembers the original location; git pull will retrieve updates from the same location.) This is the equivalent of "svn update".

Merging

When you pull from a remote repository and there have been changes on both your local copy and the remote copy since you last synchronized, Git needs to merge those changes. Usually, it will do this automatically and make a new commit which incorporates the changes from both the local and the remote repositories.

After a merge has occurred, if you look at the project history with gitk --all, you'll see a place where two history lines diverged (representing development in the local and remote repositories) and then were merged back together. (However, the output of git log flattens these nonlinearities.)

If the local and remote changes made modifications to the same pieces of code, Git may have trouble performing a merge. In this case, it will do its best, but it will leave conflict markers in the code and not commit the final result. You should fix up the conflicts and then commit the merge with git commit -a.

Pushing, and bare repositories

To take your local modifications and push them back to the original repository from which you made your clone, do this:

git push

To push to a different repository:

git push path/to/other/repo

However, if you push your local modifications to a regular repository, a person who is using that repository to do work may get confused because the state of the repository is changing right under him. So typically it's better to push to a bare repository, which is a repository without a working copy (essentially, what's in the .git subdirectory of a regular repository).

To initialize a bare repository:

git --bare init

Then use git push PATH to push to it. Bare repositories look different at the file level, but cloning from and pushing to them is otherwise the same.

Managing dotfiles with Git

To manage my dotfiles, I've made my home directory the root of a Git repo. I only add the files I'm interested in managing (.emacs, .bashrc, etc.), and Git ignores the rest of them. I push changes from that repo into a bare repository on one of my machines, and pull from that repository to get the latest versions.

The only complication is when I wish to bring my dotfiles to a new computer. Git does not allow you to clone a repo into an existing directory (as I would wish to do to clone my dotfiles into my home directory). However, things will work if you clone to a new directory, and then copy the contents of that directory (the .git subdirectory, and all the files of interest) back to your home directory:

git clone ssh://bigphil/~/projects/dotfiles mv dotfiles/.[a-zA-Z]* ~

(Note that dotfiles/* doesn't work because * doesn't usually select dotfiles.)

Version control with Git

I've recently started using Git for version control of all my personal projects. It works so smoothly that I don't have any reservations about using version control. That means I commit small changes very often; as a result, I'm never afraid of leaving my projects in a wedged state, even if I'm making big changes.

This post only discusses Git basic usage. I'll write about its distributed features in future posts.

To start managing a directory with Git:

Do the following to initialize the directory:

git init
If you have existing files you want to start managing, do

git add .

to add all the files in the directory, or

git add FILE ...

to add only particular files.
Then do

git commit

Git prompts you for a log message and records your first commit. It will print a message of the following form:

Created commit 1e169f6: Add new file. 1 files changed, 15 insertions(+), 0 deletions(-)

Congratulations!

Like SVN, Git store per-tree versions rather than per-file versions. However, instead of assigning version numbers like SVN does, Git assigns a unique hexadecimal identifier for each commit. Although these identifiers are long (40 characters), when you wish to refer to a particular commit, you only need to type as many characters are needed to make a unique prefix.

My typical workflow looks like this:

Do some editing:

emacs FILE ...
Add files to Git's staging area:

git add FILE ...

Do this whether the files are new files you've added, or existing files you've modified.
Commit the files in the staging area:

git commit

Again, Git prompts you for a log message and records a commit.

As a shortcut, git commit -a is equivalent to running git add with any modified files before running git commit. (It does not, however, pick up newly created files.)

The following commands are used to explore the project's history and current state:

git log shows recent commits.
git status shows which files are in the staging area, which files have been modified, and which newly created files are not managed by git.
git diff shows changes in your working copy.
git diff --cached shows changes in the staging area, that is, what will be committed when you do git commit.
git diff cd12..ab34 diffs the revisions cd12 and ab34.
git reset --hard restores your working copy state to that of the last commit.

The CVS version of GNU Emacs supports Git in VC, so it knows when your file is being managed by Git. When it is, you can use the following VC commands:

C-x v v commits the current file.
C-x v = shows a diff between your working copy and the last committed version (or between any two committed versions).
C-x v g displays an annotated version of the file showing, for each line, when that line was last modified, and a heat-map displaying older and newer code in different colors.

To install git on Ubuntu: type sudo apt-get install git-core.

Further reading: Everyday GIT with 20 commands or so, A tour of git.

Tweaking and recompiling .deb packages

Fixing a bug or experimenting with a program often requires going back to the source. Compiling the program yourself from a tarball usually means you lose the benefits of your package system, such as dependency tracking. Fortunately, Debian and Ubuntu include mechanisms to let you compile packages yourself without straying too far from the world of your package manager. It's actually really easy, because the repositories contain information about where to get source files and how to compile them. (I've tried these instructions on Ubuntu.)

Install compilers and other dev tools:

sudo apt-get install build-essential devscripts fakeroot

Install the build-time dependencies for the package:
```
sudo apt-get build-dep FOO
```
Download the source for your package (to the current directory):
```
apt-get source foo
```
At this point, you can make whatever modifications you wish to the source.

Compile the package:

cd FOO-6; debuild -us -uc

To compile with debugging information, use

export DEB_BUILD_OPTIONS="debug nostrip noopt"

You now have a shiny new .deb file. Install it with
```
dpkg --install FOO_6-1_i386.deb
```
Congratulations!

Emacs 22

Emacs 22 has been released. From the Emacs web site:

Emacs version 22 includes GTK+ toolkit support, enhanced mouse support, a new keyboard macro system, improved Unicode support, and drag-and-drop operation on X, plus many new modes and packages including a graphical user interface to GDB, Python mode, the mathematical tool Calc, the remote file editing system Tramp, and more.

Source downloads are available from GNU. Hopefully, most GNU/Linux distributions will package Emacs 22 soon.

Windows users can download Emacs 22 precompiled (installation instructions).

Simple version control with RCS

I had to write quite a bit of code for one of my projects this term. To keep the project manageable, I wanted a simple way to take snapshots of my files at particular milestones, so I could back up and look at previous snapshots if I ever screwed up my code. I ended up doing this with RCS.

RCS is best for situations where a single user is modifying files locally. For remote repositories or when you need to coordinate actions between multiple users, you should use another version control system.

To set up RCS in a directory:

Create a subdirectory named RCS.
To manage FILE using RCS, run $ ci -u FILE. Type a description of the file when prompted.

Emacs can deal with RCS-managed files out of the box. Here's a typical workflow:

Open an RCS-managed file in Emacs. Emacs automatically recognizes that the file is under RCS and displays the version number in the mode line.
Lock the file for editing with C-x v v.
Make your changes.
When you're at a milestone, commit your changes with C-x v v. Emacs prompts you for a commit message. Enter it in the provided buffer and press C-c C-c.
To continue editing the file, lock it again wih C-x v v and proceed as above.

Now you get to use the other cool features provided by the VC facility in Emacs:

C-x v = displays a diff between your current version and the previous committed version. C-u C-x v = lets you diff any two previous versions of the file. I like to use unified diffs whenever possible, so I have (setf vc-diff-switches "-u").
C-x v ~ prompts you for a version number and displays that version of the file in a new buffer.
C-x v g displays an annotated version of the file showing, for each line, when that line was last modified, and a heat-map displaying older and newer code in different colors.

Bridging the gap between Emacs and Gnome

I use Emacs whenever I can (which is, nowadays, for almost all the work I do) but I still need to switch away occasionally for some apps. Usually this involves starting up Nautilus, navigating to a directory, finding some files (e.g. PDFs or web pages), and double-clicking them. I wrote some Emacs functions to do the equivalent from Dired, avoiding the need to use Nautilus at all. This makes Dired a lot more useful as a file manager/browser for all kinds of files, not just text files.

gnome-open opens a file using the same application which would have been used to open it had you double-clicked it in Nautilus (Evince for PDFs, OpenOffice for OO.o docs, etc.). The following function takes a file and calls gnome-open on it:

(defun gnome-open-file (filename)
  "gnome-opens the specified file."
  (interactive "fFile to open: ")
  (let ((process-connection-type nil))
    (start-process "" nil "/usr/bin/gnome-open" filename)))

In a Dired buffer, the following function gnome-opens the file on which your cursor is sitting:

(defun dired-gnome-open-file ()
  "Opens the current file in a Dired buffer."
  (interactive)
  (gnome-open-file (dired-get-file-for-visit)))

I bound it to 'E':

(add-hook 'dired-mode-hook (lambda () (local-set-key "E" 'dired-gnome-open-file)))

dabbrev-expand

I discovered this real key-saver a few weeks ago. It makes Emacs appear to be positively psychic.

M-/ or dabbrev-expand completes the word you are in the middle of typing by looking at other words in the buffer. If no match is found it looks in other buffers. So dabbrev can complete the word or name you are typing on the basis of anything else you are reading in Emacs: the text you've previously typed, code or text in other files you have open, even documentation or web pages you have open.

If dabbrev completes your text incorrectly, you can press M-/ repeatedly to get other possible completions.

"The Guided Tour of Emacs"

An illustrated guide based on material from my class Being Productive With Emacs now appears on gnu.org, linked from the Emacs home page.

Read The Guided Tour of Emacs.

Imagemagick

Imagemagick is a set of command-line utilities for manipulating and viewing images, the most useful of which is probably convert. convert will convert images between different formats and will apply any of a number of image transformations upon request.

Why would you use convert instead of the GIMP or Photoshop? If you're doing something as simple as changing the format of an image, you can avoid the trouble of firing up an image editor. If you're applying the same transformation to many different images (e.g. generating thumbnails), you can script Imagemagick instead of opening up all the files in an image editor. And, of course, if you need to do image manipulation on-the-fly (e.g. for an interactive web site), you'll need something scriptable.

Here's how to convert an image to a different format, say, from a JPEG to a PNG:

convert boston.jpg boston.png

To do image transformations, specify some number of transformations between the input and the output file names. Here are some of my favorites:

convert boston.jpg -resize 640x480 boston-small.jpg: Resizes the image to fit in a 640x480 box while preserving the aspect ratio.
convert boston.jpg -quality 80 boston-lowres.jpg: Changes the quality of an image (for JPEG, you can specify any number from 0 [lowest image quality, least space] to 100 [highest image quality, most space]).
convert logo.png -rotate 10 logo2.png: Rotates an image by 10 degrees.
convert logo.png -negate logo2.png: Negates the colors in an image.
convert boston.jpg -fx 'luminosity' boston-grayscale.jpg: Desaturates an image (converts it to grayscale).

You can compound operations by stringing them together:

convert in.jpg -emboss 5 -fx 'luminosity' -rotate 10 out.png

convert has more options than you can shake a stick at: everything from color transformations to special effects and commands to let you draw arbitrary shapes or text on an image. And Imagemagick also provides bindings so you can use it as a library from any of a number of languages.

How to rescue a dying hard drive

If you suspect your hard drive is dying (unusually large amounts of disk thrashing, long access times, frequent file corruption, refuses to boot), and you have a spare disk, it's not hard to clone the disk. It will be easier to recover data from the clone, since continued use of the bad disk can make things worse.

If the disk is your boot disk, move it to another computer. Otherwise, be sure to unmount it first.
Determine what device and partition the disk appears on, perhaps using mount or fdisk.
Use ddrescue to make an image of the disk:
```
$ ddrescue /dev/sdb1 IMAGEFILE LOGFILE
```
where sdb1 is the node for your source disk. ddrescue is like dd (which writes a copy of all the disk's raw data to a file), but works better for damaged disks: it fills in zeros for parts of the disk it can't read; you can run ddrescue as many times as you want, and if you provide the same LOGFILE it will attempt to fill in the gaps in the image that it didn't get before.
Write the image to a new disk:
```
$ dd if=IMAGEFILE of=/dev/sdc1
```
where sdc1 is the node for your target disk.
Mount the filesystem and run fsck or an equivalent. For NTFS volumes, moving the drive to a Windows computer and running chkdsk /F works wonders.

Makefiles

You may know make as a tool for building software, but it's actually a general utility for performing tasks based on a dependency graph, even if those tasks have nothing to do with compiling software. In many situations, files are processed in some way to create other files or perform certain tasks. For example:

A .c file is compiled to make a .o file, which is then linked to other object files to make an executable.
A .tex file is processed by LaTeX to make a PDF file.
A file containing data is processed by a script to generate graphs.
A web page needs to be uploaded to a remote server after it is modified.
A piece of software needs to be installed after it is compiled.

In each case, the modification of a source file demands that some task be performed to ensure that the corresponding resource is up-to-date (i.e., it reflects the changes made to the source file).

make automates the process of keeping resources up-to-date: you tell it about what resources depend on what source files, and what needs to be done to those source files to make the resources. It then runs only those tasks that actually need to be run. No more obsessive-compulsive internal dialogues that go like this: "Did I remember to compile that program again after I was done changing it?" "Well, I don't remember, so I might as well do it again." Running a properly-configured make will compile your program if it's necessary, and do nothing if your program is up-to-date. This can save you seconds, minutes, or hours, depending on what you're doing.

make determines what to do by comparing the modification times of the resources and the source files. For example, if thesis.pdf (a resource) is created from thesis.tex (a source file), then pdflatex needs to be run if thesis.pdf doesn't exist or if it's older than thesis.tex, but nothing needs to be done if thesis.pdf is newer than thesis.tex.

To use make to manage some files in a directory, create a file called Makefile in that directory. The syntax for the Makefile is refreshingly simple. For each resource to generate (these are called targets), type:

TARGET-FILE: DEPENDENCY-FILE-1 DEPENDENCY-FILE-2 ...
       COMMAND-TO-RUN
       ANOTHER-COMMAND

Each command to be run should be indented with a single tab. Here's an example Makefile, for a system where two files are to be compiled, then linked to make an executable:

helloworld: hello1.o hello2.o
       gcc -o helloworld hello1.o hello2.o
hello1.o: hello1.c
       gcc -c hello1.c
hello2.o: hello2.c
       gcc -c hello2.c

Running $ make helloworld tells make to resolve the dependencies necessary to generate the file helloworld: make first compiles the two source files, then links them. If I were to subsequently change hello1.c and run $ make helloworld again, make would recompile that file and relink the program, but it would not recompile hello2.c.

By default, $ make (without a target name) does whatever is necessary to make the first target that appears in your Makefile (in this case, helloworld).

In fact, your make targets do not have to refer to files. You can use make to define groups of commands to be run together under an (arbitrary) easy-to-remember target name. For example, I could define a publish target which would upload my web pages:

.PHONY: publish
publish: index.html stuff.html
       scp index.html stuff.html phil@remoteserver:~/www/

(The .PHONY directive tells make to not worry that a file named publish is not being generated.) Now publishing my web page is as easy as running $ make publish.

Here are some common applications of phony targets:

After you compile a piece of software, you can frequently install it with $ make install.
$ make clean usually has the job of removing compiled files from a directory (so you are left with the "clean" source files). It can often be defined similarly to this:
```
.PHONY: clean
clean:
       rm -f helloworld *.o
```
If a source directory contains multiple independent targets, an all target is usually created to force building the individual parts, and placed first so that $ make invokes $ make all:
```
.PHONY: all
all: part1 part2 part3
```

As you can see, you can use make to automate any well-defined process you're going to want to do repeatedly.

Gnuplot plots in LaTeX

I was pleasantly surprised to learn that gnuplot and LaTeX play very well with each other. Gnuplot makes it quite easy to insert professional-looking (LaTeX-worthy) graphs into your documents. To do this, set your gnuplot terminal to epslatex:

set term epslatex
set output "graph1.eps"
plot ...

When gnuplot creates your graph, it will generate not only a graph (in EPS), but also a snippet of TeX (named graph1.tex in this case) that you can include in your document to insert the graph. Then add this to your LaTeX file:

\input{graph1.tex}

The text (axes, labels, title) on the graph are all rendered in TeX, not on the EPS, so it looks very sharp. Typically you will want to wrap the above code in a LaTeX figure as well:

\begin{figure}[tbp]
  \begin{center}
    \input{graph1.tex}
    \caption{Graph caption}
    \label{graph:graph1}
  \end{center}
\end{figure}

You can control the size of the graph with:

set size 1.0, 1.0

in the Gnuplot code.

Update: you can also generate this style plot in GNU Octave. Octave is preferable in some cases because it has a few more plot options/styles than Gnuplot does, and, of course, is a full-fledged programming language.

"Being Productive With Emacs"

I taught a class during IAP 2007 titled Being Productive With Emacs. The course web site contains the slides I used, which you may use under the terms of the GFDL.

I realize that presentation slides are not the greatest resource for learning. Eventually I'll get around to revising my material from the actual class and I'll post it somewhere in a easier-to-read format.