2008

A holiday plea

Shortly before the holidays I made donations to three organizations: the Free Software Foundation, Wikipedia, and Creative Commons. If you'll kindly indulge me for a minute, I'll explain why I think the work of these organizations is so important to an open internet and a free and properous society. Consider giving whatever you can spare (whenever you can spare it) to one of these groups.

1. So much of our everyday lives—both work and play—depends on the operation of software that we cannot really claim to be free people unless we are using free software. I'm donating to the Free Software Foundation both as thanks for the GNU operating system and in support of their campaigns. The GNU OS—from ls to emacs, and everything in between, and beyond—eclipses, in terms of power and productivity, pretty much any other OS you can buy. But peace of mind is even more valuable than technical power. As someone whose livelihood directly depends on software, it would be foolish in the extreme for me to compromise my autonomy and financial security by using proprietary software and "giving the keys to someone else."

The FSF also promotes awareness of a number of threats to freedom and innovation, like DRM (vile, vile stuff) and proprietary document formats (which are antithetical to the democratic idea of the free interchange of information).

2. I'd say that being able to learn about anything, anytime, on Wikipedia has been a pretty life-changing experience. I don't need to explain what this is like to most of you. But Wikipedia isn't valuable just because it satisfies my idle curiosities. One of the hats I wear is that of "teacher," and I love sharing knowledge. The dissemination of knowledge is one of the surest ways to produce prosperity. I'm donating to Wikipedia on behalf of children and other curious people everywhere.

3. Creative Commons is creating a body of actually useful creative works, as well as encouraging people to rethink copyright law. I feel like I've discovered a gem every time I find an ebook I can copy for offline reading, or music I can share with friends, or a comic or photo that I can put on my blog. What is perhaps more valuable is that CC is planting the idea in people's heads that maybe we can be more prosperous as a society if authors allow their work to be used in more ways rather than fewer.

Magit

Magit is a spectacular Emacs add-on for interacting with git. Magit was designed with git in mind (unlike VC mode, which is a more generic utility), so git commands map quite straightforwardly onto Magit commands. M-x magit-status tells you about the current state of your repo and gives you one-key access to many common git commands. However, what really sold me on Magit was its patch editor, which completely obsoletes my use of git add, git add --interactive, and git add --patch. If Magit had this patch editor and nothing else, I would still use it. That's how great this is.

M-x magit-status (which I've bound to C-c i) tells you about your working tree and the index, kind of like a combination of git diff, git diff --cached, and git status. It shows some number of sections (e.g. Staged changes, Unstaged changes, etc.); within each section you can see what files have been modified; within each file you can view the individual hunks. Within any of these containers you can press TAB to expand or collapse the heading. Moving your cursor into a file header or a diff hunk header selects the changes in that file or hunk, respectively. You can then press s to stage those changes, as shown in these before-and-after pictures:

Once you're satisfied with your staged changes, you can press c to commit, which prompts you for a log message. After you've typed a message, C-c C-c performs the actual commit.

This is already much faster than using git add --interactive or git add --patch to stage parts of a file. You just find the hunk you want rather than having git ask you yes/no about every hunk.

However, Magit also allows staging changes at an even finer granularity. If you highlight some lines in a hunk and then press s, Magit only stages the selected lines, as shown in these before-and-after pictures:

When in doubt, it's a good idea to make small commits rather than large commits. It's easy to revert (cherry-pick, explain, etc.) more than one commit, but hard to revert half a commit. Kudos to Magit for making small commits easier to create.

Finally, Magit comes with a fine manual, which you can read online.

Installing Magit

It doesn't get too much easier than this for external Emacs packages.

Check out Magit:

git clone git://gitorious.org/magit/mainline.git

Make sure that magit.el from that checkout, or a copy, is on your load path. For example:

(add-to-list 'load-path (expand-file-name "~/.emacs.d/lisp"))

Autoload Magit and bind magit-status:

(autoload 'magit-status "magit" nil t) (global-set-key "\C-ci" 'magit-status)

Ubuntu Jaunty Jackalope

(I am catching up on all the things that I've been meaning to blog about in the past few months.)

There are good reasons already to upgrade to the Jaunty Jackalope (development release). Either for good, or just to install the following two packages:

(1) git 1.6.x (release announcement).

(2) Firefox 3.1 beta. I have to say, I wasn't sold on Firefox 3.0. But Firefox 3.1 has convinced me to switch back from Epiphany. First, it is blazing fast. Second, in continual usage for several weeks now, it seems to be pretty crash-proof. Third, it actually has a bookmarks system that I would use. When Google can get you what you want 0.5 seconds after you type it in, you really have to rethink the idea of poking around through menus to find your favorite sites. Anyway, my gratitude goes to everyone involved.

Epiphany had also been starting to get on my nerves lately. It seems to crash at least once every other day. The address bar is really laggy sometimes (if you've ever used SSH over a high-latency connection, you know how irritating this is). And it is not as fast as Firefox 3.1, at least yet.

Jaunty is interesting for another reason. In this release, Ubuntu is attempting to make bzr repositories available for the packaging+source of every single package. I am looking forward to seeing what people will do with this. If it could make it easier for casual developers to get the source for a package, poke around to fix a bug, isolate their patches and send them to Ubuntu (or upstream), it could be a huge force multiplier.

The Slashdot Top 40

For our machine learning project, we attempted to automatically guess ratings or labels for Slashdot comments based on their content. As a side effect, we generated some data on what words and phrases tend to appear disproportionately often in high-ranked (low-ranked, interesting, uninteresting, funny, unfunny, etc.) comments.

The set of the top 40 "Funny" phrases turns out to be a hodgepodge of cultural references. I am not sure I understand all of them.

1 xkcd.com $
2 xkcd.com
3 nukem forever
4 carrier $
5 slashdot editor
6 skynet
7 clod $
8 woman $
9 grue
10 newt
11 no carrier
12 asparagus
13 nigerian prince
14 porn with
15 grue $
16 an outrage
17 kentucky $
18 eight camera
19 reality distortion
20 god what
21 six video
22 electronic games
23 locally $
24 paperbacks
25 distortion field
26 its belly
27 my underwear
28 am intrigued
29 penny-arcade.com $
30 priceless $
31 lycra
32 emacs $
33 polar bear
34 cried out
35 burma shave
36 an african
37 porn for
38 your grip
39 expects the
40 not talk

("$" means end of comment; "^" means beginning of comment.)

The list of top "Interesting" phrases suggests that workplace stories are interesting:

employees were; what worked; department i; our clients; wap; reviews on; file servers; work etc; could connect; stance that; updates the; those available; hitting my; europe to; i'm seeing; happening with; snuff; time anyone; spam has; to snuff; the bases; thin and; my college; street to; extreme programming; be neutral; late 19th; management they; from game; tenacity; withstanding; own account; right beside; magpies; from intel's; my food; obscure stuff; language when; and trash; been dragging

Meanwhile, the phrases least likely to be found in "Interesting" comments are either insulting or profane:

^ no; insensitive; again $; you insensitive; ^ oh; clod; insensitive clod; ^ you; ^ just; ^ then; ^ well; the hell; ^ or; slashdot; ^ and; you $; post $; ^ yes; ^ why; ^ but; ^ yeah; um; you mean; ^ they; wikipedia.org $; then $; religious; ^ now; clod $; mod; is called; ^ not; right $; ^ he; ^ ah; first post; ^ is; ^ your; ^ it's; fuck

These lists were generated using a corpus of 55,561 comments posted between June and November 2008.

Reading the environment variables of another process

If you need to read the environment variables of an arbitrary process, the /proc filesystem makes this easy on Linux. The environment variables are shown in /proc/PID/environ:

$ cat /proc/19065/environ
DISPLAY=localhost:0.0SHELL=/bin/bashPWD=/home/phil...

In a shell, it will just look like the variables are smooshed together. They're actually separated by \0 (null character), which you can see if you're manipulating this data in some programming language or using a proper text editor:

C-u M-! cat /proc/19065/environ RET

DISPLAY=localhost:0.0^@SHELL=/bin/bash^@PWD=/home/phil...

[Update: ps can also be made to show the environment in a manner which is more human-readable but slightly less machine-readable; see comments.]

(Found via a Stack Overflow post about changing another process's environment, which is much more difficult.)

Stupid screen tricks

I use GNU Screen extensively to manage my work. Here are a few Screen tips and shortcuts I've collected or developed.

1. Add a status bar to screen. If you use screen inside screen, or screen on more than one machine, having a status bar is essential so you always know where you are. I have the following line in my .screenrc:

caption always "%{= kw}%-w%{= BW}%n %t%{-}%+w %-= @%H - %LD %d %LM - %c"

Now whenever I'm within screen, the bottom line looks like this, showing the names of all windows (and highlighting the active one), the hostname, and the date and time:

0 emacs  1 bash  2 farm5        @ulysses - Sunday 26 October - 16:06

2. Change the screen command key. For Emacs users, the default screen command key, C-a, is unacceptable. The following line in .screenrc changes it to C-o, which is still bound in Emacs, but less objectionable:

escape ^Oo

3. Shortcut resuming screen. Whenever I SSH to my desktop machine the first thing I do— always— is resume my previous screen session. The following command runs ssh and resumes screen in one fell swoop:

ssh -t ulysses "screen -d -r"

I stick that in a script called homebase and bind a hotkey, [Windows]-h, to

gnome-terminal -e ~/scripts/homebase

in Openbox so I'm only ever one key away.

4. Use nested screens. My desktop machine has good connectivity, so from my screen there, I connect to a bunch of remote machines which I use for my work, and of course, I use screen on those hosts too.

To send a C-o to a screen instance which is itself inside a screen window, you need to escape it and type C-o o. So, for example, if C-o n is usually used to go to the next window, you'll need to use C-o o n to go to the next window in a nested instance. This setup is hard to beat for managing connections to multiple computers. It sounds complicated, but your muscle memory will take care of it soon.

5. Generate new windows automatically. Whenever I SSH into another host from inside screen, I usually want a dedicated window for that connection. The following snippet in my .bashrc makes it so that when I'm inside screen and type ssh SOME-HOST, it creates a new window and titles it SOME-HOST, then invokes ssh in the new window.

# Opens SSH on a new screen window with the appropriate name.
screen_ssh() {
    numargs=$#
    screen -t ${!numargs} ssh $@
}
if [ $TERM == "screen" -o $TERM == "screen.linux" ]; then
    alias ssh=screen_ssh
fi

(Caveat: this doesn't quite work when you specify a host and a command on the same line, e.g. ssh some-host "ls ~".)

6. Managing DISPLAYs. The disconnect-and-resume-anytime way of working can sometimes be a curse. Shells inside screen windows don't get environment variables like $DISPLAY updated whenever you resume a session. So if you carelessly launch an X program from inside one, it may end up on a display which is either long gone or not the one you intended to use. The following simple trick automagically sets DISPLAY to the display at the last place you resumed a screen session (i.e. probably where you are sitting right now).

First, write the value of $DISPLAY to a file whenever you resume screen. One way to do this is by using a shell alias like the following whenever you resume, instead of using screen -d -r directly:

alias sc='echo $DISPLAY>~/.last-display; screen -d -r'

Alternatively, the invocation from #3, above, might now look like this:

ssh -X -t ulysses "echo \$DISPLAY>~/.last-display; screen -d -r"

Now, this shell alias here sets the display appropriately, so that, for example, here xterm runs xterm on your "current" display:

alias here='DISPLAY=`cat ~/.last-display`'

Booting Linux in five seconds

This has been floating around for a couple of weeks now, but it is a good read.

At the Linux Plumbers Conference a few weeks back, two Linux developers employed by Intel demonstrated an Eee PC with GNU/Linux which had been modified to boot, to a full graphical environment, in five seconds:

They had to hold up the EEE PC for the audience, since the time required to finish booting was less than the time needed for the projector to sync.

The LWN writeup contains many details of their talk and includes quite a few interesting tidbits. (X runs a C preprocessor and compiler every time it boots? Seriously?) The two engineers conclude that the culprit for poor boot time is scores of components providing power and flexibility which only a few people use but everyone has to pay for, like the following:

[Ubuntu] spends 12 seconds running modprobe running a shell running modprobe, which ends up loading a single module. The tool for adding license-restricted drivers takes 2.5 seconds— on a system with no restricted drivers needed.

This is also a good example of the kind of innovation that simply cannot happen on proprietary systems. Information about the entire Linux/GNU/services/X stack is freely available and modifiable, and one consequence of this is that it is very easy to build on the progress of others. It then becomes strikingly clear that all of us is smarter than any one of us, and substantially more creative.

Moreover, the experience obtained here is actually being used to help improve future versions of our operating systems, rather than being confined to the backwater of hacks that appear on Slashdot and are never heard from again.

Maxwell now renders output nicely-er

Maxwell now prints the circuit element values next to each element. I suppose this makes it somewhat more useful for demonstration purposes.

This is done using the WHATWG Canvas text API, which is newly supported in Firefox 3.1, thanks to Eric Butler. (Firefox 3.0 users will get the old behavior, where the element values are printed in the status area upon mouseover.)

Ubuntu Intrepid (8.10) users can get Firefox 3.1 alpha packages from Fabien Tassin's PPA.

Scrolling with the Thinkpad's TrackPoint in Ubuntu

Update: for instructions for Ubuntu Lucid/10.04 see this post.

Update: note, these instructions work for me on Ubuntu 8.10 Intrepid as well as 9.04 Jaunty and 9.10 Karmic on a Thinkpad X61s. Alternatively, the Karmic repos have gpointing-device-settings, a GUI tool for enabling trackpoint scrolling (as well as other special trackpoint/touchpad features).

Ubuntu GNU/Linux 8.10 (Intrepid) switches to evdev for X server input, which has the unfortunate side effect of breaking old EmulateWheel configurations. So scrolling using the middle button + TrackPoint (which I absolutely love) was broken for a while, although it is now fixed. Instead of modifying your xorg.conf, create a new file called /etc/hal/fdi/policy/mouse-wheel.fdi with the following contents:

<match key="info.product" string="TPPS/2 IBM TrackPoint">
 <merge key="input.x11_options.EmulateWheel" type="string">true</merge>
 <merge key="input.x11_options.EmulateWheelButton" type="string">2</merge>
 <merge key="input.x11_options.XAxisMapping" type="string">6 7</merge>
 <merge key="input.x11_options.YAxisMapping" type="string">4 5</merge>
 <merge key="input.x11_options.ZAxisMapping" type="string">4 5</merge>
 <merge key="input.x11_options.Emulate3Buttons" type="string">true</merge>
</match>

(Based on code from Michael Vogt and adapted to support both vertical and horizontal scrolling.)

Update: you'll have to restart hal and gdm, and remove the cache file /var/cache/hald/fdi-cache, for the changes to take effect. Log in on a VT (e.g. with Ctrl+Alt+F1) and then do:

sudo rm /var/cache/hald/fdi-cache sudo /etc/init.d/hal restart sudo /etc/init.d/gdm restart

(Be sure to log in on a console/VT, because restarting GDM will kill all your X apps...)

Note for Ubuntu 8.10 users only: an update to Ubuntu Intrepid (subsequent to my original post) breaks TrackPoint scrolling either completely or possibly only after suspending and resuming. A comment on Ubuntu bug 282387 gives instructions for downloading and installing a fixed version from upstream:

sudo apt-get install build-essential git-core sudo apt-get build-dep xserver-xorg-input-evdev git clone git://git.freedesktop.org/git/xorg/driver/xf86-input-evdev cd xf86-input-evdev git reset --hard 5f2c8a2dcdf98b39997ee5e7c9a9ace3b640bfa3 ./autogen.sh --prefix=/usr make sudo make install

Later releases already have a fixed version of xserver-xorg-input-evdev.

Feedback/testing: I've tested the policy file and workaround above on an X61s. On 8.04/Intrepid, people have indicated that it seems to work on most or all R and T series Thinkpads as well as the X31, X40, X61, and X200. The X300 and X301 Thinkpads seem to have different TrackPoint hardware. On those machines you may need to disable the touchpad in the BIOS to make the above workaround work.

Thanks to all the commenters below who left additional tips for getting this to work and providing feedback on what hardware is supported!

Fixing low call volume on the Freerunner

Here's a workaround for fixing the one sound issue I've had with the Freerunner, namely that the people I'm calling complain that I'm too quiet. This is caused by a mixer setting which makes the mic volume on the Freerunner too low. My gratitude to the folks on the OM mailing lists who managed to piece this (very easy) solution together.

This kind of problem can be corrected just by running alsamixer while a call is in progress. Play around with the volume on the various channels until you are satisfied. In particular, raising the "Sidetone" volume seems to do the trick. This will fix the problem, at least for that call.

ASU keeps different sets of mixer settings for all sorts of scenarios (e.g. headset, handset, speakerphone, etc.). However, every time one of these scenarios is activated, ASU just loads the mixer settings from a file. If you change the settings in alsamixer, those changes never get written back to the file, so they aren't applied in the future.

To save those changes, adjust the settings to your satisfaction and then, during the call, to save your new mixer settings, do this:

alsactl -f /home/root/newsettings.state store

Use that to overwrite the file from which the mixer settings are read, which is /usr/share/openmoko/scenarios/gsmhandset.state (if you are curious, you can also play around with the other .state files in that directory).

For me, the net result of all this was the same as applying the following one-line patch to /usr/share/openmoko/scenarios/gsmhandset.state. You can try it too, if you dare:

--- old/gsmhandset.state Tue Sep 23 01:20:14 2008
+++ new/gsmhandset.state Mon Sep 22 00:16:28 2008
@@ -112,7 +112,7 @@
   comment.range '0 - 7'
   iface MIXER
   name 'Mono Sidetone Playback Volume'
-  value 2
+  value 6
  }
  control.13 {
   comment.access 'read write'

OM 2008.8 and 2008.9 impressions

Development is continuing at an incredible pace on the OpenMoko software. OM 2008.8 (the "ASU") was released in August and I installed it when I got back from my vacation. I also tried OM 2008.9 when was released last week. It looks like they are going to continue with updates monthly like this. This software is still rough around the edges but at this point the Freerunner is definitely usable as a primary phone now (while, I would argue, it was not with 2007.2).

To be honest, both of the 2008 series updates are very much "fix two bugs, introduce one bug". Nevertheless, the software is improving by leaps and bounds. The ASU software is a big improvement over the original GTK stack. Notably, it is much more finger-friendly, not requiring a stylus (or fingernails) for as many tasks.

Here are some of the most visible bugs that were fixed (hooray!):

2007.2's sound broke after resuming from suspend. This meant that if you wanted to use your phone, you could not really suspend it. This was fixed in 2008.8.
Call connectivity was somewhat inconsistent in 2008.8. People reported that when they called me, the call would go directly to voicemail. I occasionally had to redial a number because the first call would just die. These appear to have gone away in 2008.9.
Time zones did not persist after rebooting in 2008.8. This has been fixed in 2008.9.

To put the situation in perspective, here are, in my view, the most visible/annoying bugs that remain. None of these really block normal phone use.

I don't think connecting to the internet via wifi works, although in 2008.9 it actually detects APs now.
Battery life seems to be worse on 2008.9 than 2008.8.
Upon being woken from suspend, the phone will go right back to sleep. I have to press the power button twice to wake the phone.
2008.9 seems to crash and die more often than 2009.8. This seems to be related to suspending or waking the phone.

One other bug I've had is an issue with sound quality (the volume being too low). I'll show you how I'm working around that in my next post.

Interesting recent talks

Some interesting talks I heard recently:

Ron Rivest talked about the MD6 hash function, which is a candidate in the upcoming SHA-3 hash contest, sponsored by NIST. It is based on a tree structure instead of a chain structure so that it's parallelizable, is resistant against certain cut-and-paste attacks, and is provably resistant to differential attacks, which have been worrying people. Extensive computer simulations were used to verify that MD6 "scrambles" its inputs enough to put a lower bound on the complexity of this particular kind of attack which is higher than the birthday bound.

Kevin Knight gave a talk on machine translation without parallel texts. Which sounds, on the face of it, absurd. Except that translation can be thought of as just applying a secret code, and statistical methods for cracking codes (without having much/any parallel text, of course!) have been around for a long time. Nowadays, computers can solve the cryptogram you find in the newspaper, and some analagous problems, but we don't know a whole lot about stuff beyond that. This kind of research could be useful for helping to translate texts in specific domains (e.g. computer manuals), where lots of example text can be found but little parallel text is available. Methods trained on general corpora often do quite poorly in this kind of niche.

By the way, I love this comic by Kevin Knight.

Software Freedom Day 2008 Boston; Happy 25th birthday to GNU

I went to the Software Freedom Day 2008 event in Boston today. It was a good opportunity to learn about the fascinating things people are doing with free software (and free culture ideas in general), as well as to learn about some of the major threats to our freedom, and some things we can do about them. Thanks to everyone who was involved.

Some nifty things:

Máirín Duffy spoke about how she makes, among other things, the Fedora Project's artwork (web graphics, signage, CDs, etc.) using only free software tools. This is a shining example of how users of free software gain autonomy: her clients can fix up or update her work instead of having to ask her to do it. This would be much more difficult if those people, who are not graphics professionals, had to shell out $1100 for Adobe Creative Suite. Open formats are a huge plus, too: being able to programmatically modify images at a high level is a huge time-saver.
CHDK is free firmware which supplements the firmware in some Canon digital cameras. People are using it to add all kinds of awesome new features to the camera, such as battery meters (not available on low-end cameras), RAW support, live histograms, focus and exposure bracketing, automatic photography of lightning, and reversi. And high speed photography which, seriously, looks like some of those Doc Edgerton strobe photos, except they were taken using a $100 digital camera.
"What if Wikipedia only allowed you to enter statements that were true?" Cameron Freer is building vdash.org, which will be a wiki of computer-verifiable proofs. Imagine: our kids might learn math out of interactive textbooks where you could click on any step in a proof and ask "Why?" and it would expand to fill in the gaps in the proof. That would almost be straight out of The Diamond Age.
Aaron Swartz talked about his newest venture, watchdog.net, "a hub for data about politics". But he also talked more generally about the somewhat unconventional ways in which he is trying to coordinate people to help get public data onto the internet. The idea that there are certain kinds of knowledge that rightfully belong to "us" (the public) and not any one single entity is quite refreshing in this age of government/institutional opacity.

Now, for the bad news. The two most talked-about threats to free software were mobile phones and DRM. The big two software makers have done a good job of convincing people that they shouldn't care about freedom on phones because it's "just a phone". But if you think of all the things you can do on phones nowadays, you realize that you really don't want to surrender your autonomy there, regardless of what your phone maker claims. DRM is a little better. It has been somewhat widely discredited in a number of high-profile incidents now. Sadly, that doesn't keep people from trying it.

Advocating free software should be easy, right? (Ed.: apparently, no.) After all, the big proprietary software companies have made huge missteps lately. They just can't stop making their software more and more restrictive and obnoxious! Not to take pride in the misfortune of their customers, but I believe there is a good chance that in the short term future it will become increasingly clear to many "ordinary folks" that proprietary software is not just a theoretical hazard but a real liability. Multiply that danger by the pervasiveness of software in our everyday lives and business, an extent which probably even most technically-minded people fail to completely appreciate. So I believe that projects like Mako Hill's Revealing Errors are a critical link in helping people, especially non-technical people, understand the importance of free software.

Happy 25th birthday to GNU! Richard Stallman gave a lightning talk about the story of GNU, among other things. To hear it from the horse's mouth on this anniversary helped to put in perspective how far we have come in the last 25 years, and how far we still have to go. (Boy, do I feel odd personally identifying with a movement that is older than I am.) May the next 25 years be just as productive.

Emacs 22.3 released; emacs-snapshot packages for Ubuntu

Emacs 22.3 was released on 5 September 2008.

You may know that Romain Francoise maintains a repository with emacs-snapshot packages for Debian updated weekly. Well, what about Ubuntu users? Since earlier this summer-ish, the ubuntu-elisp PPA has also been receiving emacs-snapshot packages regularly! Great news for those of us who always want to try out the features that the awesome hackers on emacs-devel are writing about.

Report: Ubuntu Intrepid on the OLPC XO-1

I have a setup that I'm quite pleased with on my Ubuntu machines, so I recently installed Ubuntu to my XO-1, thinking that I would get more out of the machine than I have with Sugar/Fedora. I bought a 4GB SD card to use with the machine and followed these instructions, which call for installing to a disk image within Qemu and then copying that disk image to an SD card.

I installed an alpha of Ubuntu Intrepid Ibex 8.10 without any major hitches. It works. I'm using Openbox as my window manager, which keeps touchpad use to a minimum.

Some remarks on what hardware works out of the box, with Ubuntu Intrepid:

Wireless works.
Display works. According to glxinfo it is doing direct rendering. glxgears runs at a whopping 22 frames per second.
The brightness and volume keys don't work out of the box. The volume keys ("F11" and "F12", really) can be configured through GNOME. I do not know whether there is a quick way to get GNOME to also recognize the brightness keys ("F9" and "F10", really).
X recognizes the game keys but not the rocker.
The touchpad works, but is hyper-sensitive and registers taps all the time when you are using it. To disable tap-to-click, add a line reading options mousedev tap_time=0 to your /etc/modprobe.d/olpc.conf.dist. (Source)
The microphone light is on all the time. I wonder if this is causing some power drain I don't want (apart from the light itself).
Cheese does not recognize the webcam.
Ubuntu does not know how to suspend-to-RAM the machine. It claims to be able to hibernate, but I haven't actually tried it.

It's a little poky, but more than adequate for using Emacs, reading mail, and browsing the web.

Reading your mail in Emacs with VM

I posted an article on reading your mail with VM in Emacs, which contains a minimal VM orientation, information about how I read mail in VM, and the elisp configuration I used to set that up.

I attempted the switch a couple of months back after realizing that the volume of email I was dealing with was seriously impairing my work ability. Gmail does have some tools for managing lots of email (labels), but with VM I can be much more fine-grained about what messages I look at at any particular time. Scanning and reading mail is also much faster, if only because I don't have to wait for the RTT to a Google server and the browser render time every time I want to do something.

And, of course, the best part of VM is that it comes with a great e-mail editor.

Why VM? The big choices are Gnus, VM, and MH-E. I had tried Gnus and couldn't really wrap my head around its model of doing things. MH-E sounded acceptable but I assumed that VM, which had more Lisp implementation, would probably be more flexible. I am pretty happy with VM, but I suspect that changing mail clients is really just a matter of performing a conversion process on the mail folder files. Some operations (such as writing an entire mailbox to disk) are probably slower than they could be, but in general it is blazing fast.

I also wrote about setting up BBDB (the Insidious Big Brother Database), an address book program which is strangely satisfying to use.

Read about my VM configuration and workflow.

Who thought of this stuff?

Matlab's documentation for the logspace function:

>> help logspace
 LOGSPACE Logarithmically spaced vector.
    LOGSPACE(X1, X2) generates a row vector of 50 logarithmically
    equally spaced points between decades 10^X1 and 10^X2.  If X2
    is pi, then the points are between 10^X1 and pi [instead].

This very odd specification is, of course, totally well-intentioned. The online documentation gives this rationale:

y = logspace(a,pi) generates the points between 10^a and pi, which is useful for digital signal processing where frequencies over this interval go around the unit circle.

Unfortunately, half the Matlab functions I read about seem to have these bizarre corner cases and caveats. I can't help but think that this is the numerical equivalent of Perl and the "penny wise, pound foolish" approach to language design: all these odd shortcuts make Matlab code (slightly) easier to write but harder to debug, read, and learn from.

Coloring email in Emacs VM

I do like Gmail, but (1) it feels sluggish, (2) writing anything nontrivial in a browser text box is just too awkward, and (3) I occasionally wish I had a copy of my mail accessible offline. So I've been trying to switch over to retrieving and reading my mail in Emacs. After a failed experiment in using Gnus to read my mail last year, I recently mustered the energy to try VM.

It seems to be working well, and I'll write more about my configuration and workflow later, but I wanted to mention one package that I just found.

One of the (few) things I liked about Gnus was that it color-coded the blurbs in each message by author, which it inferred from the line prefixes (">", ">>", etc.), like so:

VM doesn't support such functionality out of the box, but the package u-vm-color.el mimics it. The package is fairly straightforward to install and works as advertised.

One thing I discovered that when you run a TTY Emacs it assumes that it is working with light text on a dark background. If you are running Emacs in an xterm with dark text on a light background, you'll need to supply the -rv (reverse video) option to Emacs. Otherwise, Emacs may choose unreadable, or at least, suboptimal, colors for all its faces.

Sharing GPS tracks from tangoGPS on Google Maps

Now that GPS is working on the FreeRunner I made a track log of my commute to test it out. It's pretty easy to get log data off of the FreeRunner and plot it on the web in a Google Map:

First, install tangoGPS. The latest tangoGPS packages for the FreeRunner, and instructions for installing them, are available from this page. It usually takes my FreeRunner a couple of minutes to get its first fix. If you are having problems, OpenMoko's AGPS program may be able to give some debugging information.
TangoGPS will appear as an application called "GPS & Map". To record tracks, go to the "Track" tab and click "Start" (and then "Stop", obviously, when you're done).
TangoGPS will save the track log in /tmp (by default, but it's configurable) in a file with the extension .log and named after the current date/time. When you get back to a computer, scp that file over.
GPX is a commonly used format for representing GPS track data. You will need to convert your log data to GPX using convert2gpx.pl, a script provided by TangoGPS. Download it, make it executable, and use it like so: ./convert2gpx.pl inputlogfile.log > outputfile.gpx.
There are some web sites which will let you upload a GPX file and then plot it on a Google Map. gpsvisualizer.com is one of them. It will give your map a semi-persistent URL so you can show it to your friends for a short time.

Towards using the FreeRunner as my primary phone

Having had some time to play around with the FreeRunner's software (see my previous post), I can make a few more remarks about it now. By the way, if you are planning to get a FreeRunner (or if you have one), you should know that the wiki, as well as the community and support mailing lists, are invaluable resources for figuring how to get things working or the best way to do something.

First of all, having a phone that you can SSH into and do all the usual Linux-y stuff on is very, very, cool. When you plug the phone into your GNU/Linux computer it appears as a device on the other end of a new network interface usb0. An SSH server is configured and works out of the box. You need to do a small amount of configuration to let your FreeRunner use your computer's connection to get to the internet.

The FreeRunner also comes with a package manager (opkg) and a set of repositories from which you can easily install new software. The packages are changing fast and getting new fixes all the time. Software upgrading is as easy as: opkg update; opkg upgrade. It's a snap to install new packages, too. I was delighted to be able to run Python on my phone.

(Now, if only someone would port Emacs and develop an on-screen keyboard layout suitable for using it.)

I installed a PDF reader and downloaded a couple of e-books to the phone. Astonishingly I can (pretty comfortably) read pages formatted for printed books on the FreeRunner's screen.

If you have been following the FreeRunner news you know that there are a bunch of software distributions you can choose from (at least three "official" ones as of this writing). This may seem worrisome but it's really not. My understanding is that all of the software distributions use the same repositories, merely installing different packages by default (e.g. the base apps and the launcher). From the standpoint of a software developer, you don't have to worry about painting yourself into a corner by choosing the "wrong" distribution: most apps should more or less run under all distributions. And from the standpoint of a user, reflashing your device is not difficult at all, if it turns out that a different distribution attains a critical mass. As long as you back up the good stuff (probably your home directory and parts of /etc), you should be able to change distributions relatively painlessly.

So, which distribution to choose? I've tried the 2007.2 image (the factory-installed software) and the ASU (a port of Qtopia to X11). The ASU will become the preferred distribution in the long term, and OpenMoko's attention is going there now. At this time, though, it feels a lot less slick in most places than the 2007.2 image. I can't make or receive phone calls with ASU and my current SIM card, which worked fine on 2007.2. The apps are more designed for a stylus rather than fingers. There is no terminal app, but that is expected to be addressed soon.

If I can't find a quick resolution for the phone call problem, then I will probably go back to 2007.2. In either case, I plan to start using the FreeRunner as my primary phone. I'll also start to investigate options for writing quick apps to run on the phone. The more I play with the FreeRunner, the more I think about how those who can write code for the phone could modify or completely reinvent their workflows. Imagine having the adeptness of an Emacs whiz while working on your phone. That would make a general-purpose programmable phone an awesome device indeed.

Update: the PDF reader I'm using is epdfview

FreeRunner first impressions

OpenMoko's FreeRunner went on sale on July 3. I ordered early that day and my FreeRunner arrived today.

This is the only product I've ever felt compelled to take unboxing photos of:

Click for FreeRunner unboxing photos

First of all, this thing is a lot tinier than I was expecting. It is not significantly larger or heavier than my current phone, a Razr. I suppose I should not be surprised because the last smaller-than-laptop device I purchased was a PDA back in the year 2001 or so. (I've never insisted on being at the cutting edge of mobile technology.)

The texture is a lot like that on the outside of a Thinkpad.

I haven't had much time to play with it except for charging it up and making a phone call (which worked, well). The screen is amazing, by the way— higher resolution than that of pretty much any other phone-like or PDA-like device you can buy today.

The software utilities that come with the phone are still in a state of churn, and it is not yet in a state where it can exercise all parts of the hardware reliably (GPS, etc.). As far as I am concerned, none of these things are deal-breakers. If you want to make the phone the best thing it can be, you had better start by removing all the man-made problems so people can work on solving real problems. I wanted a phone with freedom, and that's what I am getting.

One of my favorite things about OpenMoko is the people and the community spirit. You can really tell that this is not only a different kind of phone but also a different kind of company. Witness this exchange from the last week:

Michael <simarillion>: can somebody tell me if I will lose my warranty when I open my Freerunner.
Sean Moss-Pultz <sean@openmoko.com>: [...] Do you really think we could get away with that kind of policy?! This is Openmoko. If you don't open your Neo, you should probably have your warranty voided ;-)

Living fearlessly with a laptop

My Thinkpad X61s is tiny and powerful, so I like to take it places with me. Of course, when you go places with a laptop, there is a risk of it getting lost, broken, or stolen. For me (as for most people, I would guess) the integrity of the data on my laptop is far more valuable than the cost of the laptop itself. In fact, there are two measures I take which almost entirely mitigate the risks of getting my laptop stolen (lost, broken, etc.). Consequently, I have become more willing to bring my laptop places. Here is what I do:

Use hard disk encryption. The Alternate installers in Ubuntu and Debian give you the option of easily configuring an encrypted hard disk. Everything (except the /boot partition, but even your swap) that goes onto the disk is transparently encrypted. You just need to type a password whenever you boot up your computer. There is some overhead associated with encrypting everything, but if you have more than one core you will rarely notice it.

This means that, should your laptop fall into the wrong hands, no useful information whatsoever can be extracted from the hard disk.

For this to work well, you need to lock your screen when you are not using it, and your computer needs to be configured to lock the screen when you wake from suspend. It should be noted that some attacks have been described on hard disk encryption techniques. While the risk of these attacks remains low for most targets, if you are paranoid you would have to shut down (not just suspend) your computer before taking it places. You would also have to consider either overwriting the most sensitive areas of memory before you shutdown, or leaving your computer powered off for a couple of hours before taking it anywhere.

Keep backups. I do my backups over the internet so that I can backup from anywhere. I use a variant on the following script:

rsync -vaxE --delete --ignore-errors --delete-excluded --filter="merge excluded-files" /home/phil/ remotehost:/path/to/backup/destination/

where excluded-files is a file that looks like this, and contains some paths that I don't want backed up (usually, local cache-like places that are generally space-consuming and not terribly useful):

- /.local/share/Trash/

- /.mozilla/firefox/*/Cache/

- /.thumbnails/

I run this about as often as I can remember to, and before I shut down my laptop to take it somewhere. That's all there is to it.

With this measure, I can be quite confident that were my laptop to vaporize, I would lose nothing at all. It has the fortuitous side effect of making it super easy to reinstall an operating system.

Linus Torvalds on Git

I finally got around to watching the video of the tech talk that Linus gave at Google discussing the design of Git.

In this video, Linus explains a lot of the advantages of using a distributed system. But it is also enlightening because it's a window into Linus's motivations: he discusses the ways in which his own needs— as a system maintainer— drove the design of the system, in particular in the areas of workflow, speed, and data integrity.

One interesting idea is that in DVCS, the preferred development workflow (you pull from a small group of people you trust, who in turn pull from people they trust...) mirrors the way humans are wired to think about social situations. You cannot directly trust a huge group of people, but you can transitively trust many people via a web of trust— a familiar concept from security. A centralized system cannot scale because there are n² pairs of conflicts waiting to happen, and they will happen, because groups of people are distributed (not everyone is in the same room at the same time on the same LAN). But a DVCS workflow can scale, because it is fundamentally based on interactions between people and not on the artificial technical requirement that there has to be a single canonical place for everything.

Warning: Linus has strong opinions. I think he refers to at least three different groups of people as "ugly and stupid" in the course of his 70-minute talk.

A million lines of Lisp

People rave about Lisp, and one reason why is because you can use macros to make new kinds of abstractions. Sure, in other languages you can write functions to define new procedures, but in Lisp you can write macros to define new control flow constructs (for starters). You can write functions that write functions and programs that write programs, a power which is pretty much unimaginable in most other languages. In other languages you can only stack your abstractions so high, but in Lisp, the sky's the limit. Because of macros, and others of its features, Lisp is sometimes called the programmable programming language.

I suspect that because of macros, Lisp programs grow in a way that is qualitatively different from programs in other languages. With every line of additional code in Lisp, you can do more, more quickly; in ordinary languages, the return on additional lines of code is constant if not decreasing. Various people have tried to estimate the advantage that Lisp has over C++ (in the ratio of the number of lines needed to do a particular task). Whatever the figure, I believe that Lisp's advantage should increase for larger and larger programs.

In pretty much any language, a program with a million lines of code is regarded with considerable respect. So if Lisp can do so much more with so much less, what would a million line Lisp program look like? Could we even imagine it? (Would it be sentient, or what?)

It turns that such a program exists, and is widely available.

As of right now (23 June 2008), a checkout of GNU Emacs has 1,112,341 lines of Lisp as well as 346,822 lines of C.

It is somewhat astonishing that Emacs, a program with more than 30,000 built-in functions— in a single namespace&mdash can keep from falling apart under its own weight! In Emacs, activating minor or major modes, or setting variables, can have effects on how particular buffers look, what is done when Emacs saves a file, and how different keys (or other events) are interpreted, among other things. Unlike operating systems, which are broken down into programs (which only talk to each other in limited ways), Emacs has many parts which all have the potential to interact with each other.

In some cases it is necessary to add special-case code to say what should happen for particular pairs of interacting features, but a robust system should be able to make the right thing happen most of the time even for features which are totally oblivious of each other. One way that Emacs tames this complexity is with the use of hooks.

Hooks (sometimes referred to elsewhere as listeners) are a way of telling Emacs to run a particular piece of code whenever a certain situation happens (e.g. when a file is opened). Many Emacs modules add hooks in order to do their thing. For example, VC checks every time I open a file whether it is in a version-controlled directory, so that it can report the file's version and status. Another example: Emacs activates follow-mode— if I have set a particular option— whenever I open a file. The variable find-file-hook is a list containing, among others, two functions responsible for performing the tasks described above; each function in that list is executed whenever Emacs opens a new file. I were to add a function of my own to one of these hooks, then Emacs would happily run it, too.

As an alternative, you might consider implementing such an extensible system using polymorphism and inheritance: e.g. to add behaviors, you would create a subclass of a particular class which implemented some method differently. The chief advantage of using hooks over that approach is that with hooks, changing behavior is very obviously additive: if you can call for either A, B, or C to happen in a particular situation, you can also easily call for any combination of those things, but A, B, and C can very well be implemented completely independently— a key requirement for systems of a million lines or more.

My Top Ten Essential Emacs Tips

I wrote a brief article detailing what I consider to be the top ten can't-live-without-them Emacs features.

Switching to Openbox

I switched window managers recently, to Openbox. I also switched panels from gnome-panel to pypanel.

My new setup has these chief advantages over the old setup:

Speed. Going from GDM to a desktop that is ready is much faster than it was under Gnome. I blame gnome-session and the gnome-panel.
Openbox is much more flexible.

I can rebind the window management keys, so I'm now using the Windows key (which was previously not seeing much use). I've bound W-Tab to serve the same purpose that Alt-Tab used to. I've bound W-j, W-k to switch virtual desktops, and W-1, W-2, and other keys to do window management tasks (maximizing, minimizing, etc.). No more reaching for the arrow keys or the function keys for complete common tasks. And it frees up Alt-Tab, which Emacs uses.
I can bind keys to other actions: for example, W-c starts up a terminal and W-b starts up a web browser.
I can bind mouse actions, too. I've set up W-drag to move windows (just as Alt-drag does usually), and W-right-drag, which resizes my windows, is a lot better than trying to drag window borders (hello, Fitt's Law!).
The configuration files can be easily version-controlled, so I synchronize my Openbox settings everywhere. This is not true of my Gnome/Metacity setup.

Space. I turned off window decorations to save space. Who needs window decorations? I don't need them to see what application I'm using because I have a panel. I don't need them to move windows because W-drag is faster. And I don't need them for window context menus because W-SPC already gets me one.

To install it on Debian/Ubuntu: apt-get install openbox

The kicker is that Openbox was apparently designed with Gnome interoperability in mind. And it is easy to switch back and forth while you are testing. All you need to do is log out to get back to GDM and then select either the "Gnome" or "Openbox" sessions from the Session menu. No futzing with files in your home directory to select your session. Moreover, the default Openbox setup loads a bunch of stuff that your GTK+/Gnome apps need to work. So your GTK themes and Gnome settings will be just fine, D-Bus and other session services will be launched as normal, and all that. It was suprisingly painless to switch.

To configure Openbox, after installing it, copy the XML files in /etc/xdg/openbox to ~/.config/openbox. Check out the default configuration to see what you are in for. Remember that when you log in to Openbox, right-clicking on the Desktop will probably get you enough to get out of a jam. Enjoy!

Installing Ubuntu from hard disk + grub

Previously, I praised Debian for supporting an installation method ("hard disk booting") that only requires an existing filesystem and Grub, and can be kicked off just by downloading two files and asking Grub to boot from them. This is really convenient because you can install a full system without a CD (or USB key, or any media), without PXE (or configuring any other hosts on your LAN), and with a download that is quite small.

Well, it turns out that Ubuntu supports this installation method too (which is, now that I think about it, not surprising). It just doesn't seem to be advertised anywhere! (I suppose that instead of praising Debian I should praise the Debian documentation.) I just used it to install Ubuntu with encrypted LVM on my Thinkpad X61s, a machine which would otherwise be nontrivial to reinstall because it has no optical drive.

Here are the links to the downloads (the "netboot" installer): i386, amd64. Download initrd.gz (disk image) and linux (the kernel). Make sure you put them somewhere Grub knows how to get to (i.e. not to a networked or encrypted volume; /boot is a good place). Then do the following to boot into your new installer (more complete instructions from Debian):

Restart your computer and wait for Grub to load.
Find some existing boot entry and press e to edit it.
Edit the root line to make sure that it corresponds to the partition where you downloaded the files. It might already be correct.
Edit the kernel line so it reads kernel /boot/wherever/you/put/linux (on recent grubs, you may have to use linux instead of kernel).
Edit the initrd line so it reads initrd /boot/wherever/you/put/initrd.gz
Press b to boot.

Enjoy your new installer!

Update, 19 Jun 2008: I have documented this procedure in the Ubuntu Wiki here.

Using wget or curl to download web sites for archival

wget is useful for downloading entire web sites recursively. For archival purposes, what you want is usually something like this:

wget -rkp -l3 -np -nH --cut-dirs=1 http://web.psung.name/emacstips/

This will start at the specified URL and recursively download pages up to 3 links away from the original page, but only pages which are in the directory of the URL you specified (emacstips/) or one of its subdirectories.

wget will also rewrite the links in the pages it downloaded to make your downloaded copy a useful local copy, and it will download all page prerequisites (e.g. images, stylesheets, and the like).

The last two options -nH --cut-dirs=1 control where wget places the output. If you omitted those two options, wget would, for example, download http://web.psung.name/emacstips/index.html and place it under a subdirectory web.psung.name/emacstips of the current directory. With only -nH ("no host directory") wget would write that same file to a subdirectory emacstips. And with both options wget would write that same file to the current directory. In general, if you want to reduce the number of extraneous directories created, change cut-dirs to be the number of leading directories in your URL.

Bonus: downloading files with curl

Another tool, curl, provides some of the same features as wget but also some complementary features. One thing that curl can do is to download sequentially numbered files, specified using brackets [..]. For example, the following string:

http://www.cl.cam.ac.uk/~rja14/Papers/SE-[01-24].pdf

refers to the 24 chapters of Ross Anderson's Security Engineering: http://www.cl.cam.ac.uk/~rja14/Papers/SE-01.pdf, http://www.cl.cam.ac.uk/~rja14/Papers/SE-02.pdf, etc., http://www.cl.cam.ac.uk/~rja14/Papers/SE-24.pdf.

You can give curl a pattern for naming the output files. For example if I wanted the files to be named SE-chapter-01.pdf, etc, then the appropriate curl incantation would be:

curl http://www.cl.cam.ac.uk/~rja14/Papers/SE-[01-24].pdf -o "SE-chapter-#1.pdf"

In addition to specifying consecutively numbered files, you can also use braces {..} to specify alternatives, as you would in a shell, e.g. http://web.psung.name/page/{one,two,three}.html. Specifying output patterns with "#1" works with braces too.

Managing dotfiles with git, continued

Previously, I commented on my setup for keeping my dotfiles (.emacs, .bashrc, etc.) synchronized using git. Here is one refinement I've made to this process in the meantime.

Allowing local customizations

Occasionally there are changes I'd like to keep local to one machine. These may be on a permanent basis (for example, if there are certain things I'd like to happen, or not happen, on my laptop but not my desktop) or on a temporary basis (if I want to test out some change locally for some time before pushing it to my canonical repo). In version control a setup like this is best represented using branches. Here's how I've done this:

The master branch contains customizations that are supposed to be common to all machines and are appropriate to apply everywhere. Most changes are of this form. But on each machine I maintain a local branch named, for example, laptop-custom. This branch contains all the changes in master, plus usually no more than a couple of changes specific to that machine.

To initially set this up, after making a clone, I create a new branch and switch to it. Most of the time I stay on this branch.

git checkout -b laptop-custom

When I make changes, they initially go in to laptop-custom as local changes:

emacs # make some changes... git add ... git commit

If I decide a change is appropriate to apply everywhere, I put it on the master branch by using git-cherry-pick. I then rebase the local branch so the local patches always are at the "end". When you cherry-pick a patch to master and then rebase the other branch, git recognizes that the patch has already been applied on master and does not attempt to apply it again. So as you move changes over, the number of patches which are exclusive to the local branch decreases.

git checkout master git cherry-pick ccddeef git checkout laptop-custom git rebase master

Pushing and pulling the master branch (containing the common customizations) is done in the same way as before, except that I always rebase the local branch afterwards.

git checkout master git pull git push git checkout laptop-custom git rebase master

For the benefit of posterity

index-pack died of signal 25

may occur when you try to pull from a repo created with git 1.5.x using git 1.4.x. Just wanted to put that out there for Google.

FreeRunner entering mass production

Word on the OpenMoko community list is that FreeRunner has been cleared to enter mass production.

FreeRunner will be the world's first freed phone, and it is arriving not a minute too soon. Mobile phones are now everywhere, and they are becoming the premier mode of communication and computation for many, especially in the developing world. Mobile phones can deliver on the promise of ubiquitous computing— but only if they have been freed.

For the mobile phone, or any technology, to realize its true potential, the ones with the incentive to see it improve— the users— must have the power to improve it. That is as sure a law as there ever was one, and should be pretty apparent to anyone who has taken an economics class. Unfortunately, essentially all phones sold today are deficient in that respect.

The power to improve the system may, of course, be realized exercised directly (if I do some work myself) or indirectly (if I pay someone else to do it). But when this power is totally sequestered away, that necessarily puts a damper on innovation. This is the case with any proprietary software product: the vendor is the only one with the power and the right to make changes to the software. Sure, you could attempt to pay the vendor to make the changes. But they, being the only ones who can do it anyway, will charge monopoly prices. And they can refuse to do it at all if doing so would, for example, cut into sales of another of their products. So as long as they are the sole entry-point, you are beholden to them.

Even if one can assume that the vendor is generally benevolent, they still have a finite amount of resources. They cannot entertain implementation requests from every guy in his office, school, or lab. And that is unfortunate because one of those people has the next big thing on his hands. Creativity is everywhere.

The two great revolutions in computing— the rise of the PC, and the emergence of web applications— demonstrate that freedom leads to the kind of innovation that transforms people's lives. It is no accident that the explosion in personal computers happened on the platform that had commodity hardware, not the one with a single hardware vendor. And I can say with some confidence that the web would not be what it was today had AOL (yes, remember AOL?) been its sole gatekeeper for both access and content.

The mobile phone ecosystem is still in its infancy. Today, mobile phone software and hardware do not support (and sometimes actively inhibit) using a device to its fullest. But when (and only when) mobile phones are unshackled, we will see creative innovations that we can probably not even imagine today. When mobile phones are truly ubiquitous they will be not just devices for communication but also for computation, sensing, and entertainment, and they will be deeply integrated into the activities of our lives.

One of the goals for FreeRunner is to have a phone which runs on free software, but what is neat about OpenMoko is that they realize that they are not just a software project. They are doing whatever it takes to help the mobile phone reach ubiquity. OpenMoko released the CAD files for the case of the FreeRunner— people are talking about machining cases in different colors, alternate styles, even a bicycle mount for the FreeRunner. I cannot wait to see what is next.

Comparing directory trees with diff or rsync

When you are trying to figure out if (and how) the contents of two directories differ, there are a couple of standard/common ways to use diff.

diff -Naur DIR1 DIR2 shows all differences between the two directories as a unified diff:

$ diff -Naur website website-new
diff -Naur website/index.shtml website-new/index.shtml
--- website/index.shtml        2008-05-22 20:16:12.000000000 -0400
+++ website-new/index.shtml    2008-06-04 12:10:50.000000000 -0400
@@ -14,6 +14,7 @@
 <!-- page body -->
 <div id="body">

+<p>Welcome!</p>

 <p>
   <b>About:</b> This subject is aimed at students with little or no
diff -Naur website/style.css website-new/style.css
--- website/style.css  2008-04-11 01:25:12.000000000 -0400
+++ website-new/style.css      2008-06-04 12:11:01.000000000 -0400
@@ -24,7 +24,7 @@
     color: white; text-decoration: none; font-weight: bold; padding: 0 0.25em;
 }

-div#body { padding: 0.1em 55px 2em 55px; font-size: small }
+div#body { padding: 0.1em 55px 2em 55px; font-size: medium }

 dd { margin-bottom: 1em }

On the other hand, if you just want to get a quick sense of the differences, diff -qr DIR1 DIR2 merely names the differing files:

$ diff -qr website website-new
Files website/index.shtml and website-new/index.shtml differ
Files website/style.css and website-new/style.css differ

rsync can do something similar, and it works even when the files are not on the same host. rsync -rvnc --delete DIR1/ remotehost:path/to/DIR2/ (the trailing slash on DIR1/ is important here!) will tell you what files rsync would have updated or deleted on the remote host. (The -n option makes rsync do a "dry-run", meaning it makes no changes on the remote host.)

$ rsync -rvnc --delete website/ laptop:projects/website/
deleting schedule.shtml
style.css

The -c option is used because we're paranoid: it forces rsync to compute and compare checksums for each file to verify that they are the same. Otherwise, rsync assumes that files are the same if they have the same timestamp and size (sometimes this gives false positives if timestamps were not preserved when you made the copy). If that is acceptable, you can omit -c to get a speedup.

Breaking through firewalls with a ping tunnel

When traveling, you may come across wireless hotspots where you have to pay before you can send TCP packets to arbitrary destinations the internet. However, it is frequently the case that you can send ping (ICMP echo) packets to any host on the internet. This is like locking the front door and leaving the window open, because ICMP allows echo packets and their replies to carry payloads. You can therefore use ICMP as a substrate for another channel of communication. ptunnel is a tool which takes a TCP connection and tunnels it over ICMP.

In this post I assume that you want to tunnel an SSH connection over ICMP. Not only is SSH a common application, you can take other channels and tunnel them over SSH (for example, an HTTP proxy, so that you can browse the web).

You will need to install ptunnel on two hosts: the proxy (any well-known host on the internet) and your client (typically, the laptop you are taking with you). On Debian/Ubuntu, this can be done with apt-get install ptunnel.

On the proxy, do the following:

PROXY$ sudo ptunnel -x PASSWORD

replacing PASSWORD with a password of your choice.

On the client, do the following:

CLIENT$ sudo ptunnel -p nameofproxy.domainname.com -lp 6789 -da localhost -dp 22 -c wlan0 -x PASSWORD

Replace the options with (respectively) the address of the proxy, a port number of your choice, the name and port of the server you wish to connect to (as seen by the proxy; in this case we assume that the SSH server is on the proxy itself), the network interface you are using, and the password you selected.

Then, connect via SSH using the port you specified in the previous part:

CLIENT$ ssh -p 6789 localhost

Using the web over your tunnel

SSH can be easily configured to act as a web proxy and forward all HTTP requests over the line. To do this, replace the above ssh invocation with the following:

CLIENT$ ssh -p 6789 -D 8080 localhost

Then, configure your web browser to use the proxy you've just created. In Firefox, for example: Preferences/Options; Advanced tab; Network tab; Settings; Manual proxy configuration; SOCKS host: localhost; port: 8080.

From dabbrev-expand to hippie-expand

I've "graduated" from using dabbrev-expand and switched to hippie-expand. hippie-expand does much the same thing has dabbrev-expand (completes words you are typing) but supports adding new completion heuristics rather than only looking at text in other buffers for potential completions. I switched when I found myself pressing M-/ and hoping to get completions corresponding to the names of other files I had open. hippie-expand does this out of the box.

To set it up, all you need to do is bind M-/ to hippie-expand (which comes with Emacs):

(global-set-key "\M-/" 'hippie-expand)

By default, hippie-expand uses the following set of completion techniques (customizable in hippie-expand-try-functions-list):

  '(try-complete-file-name-partially
    try-complete-file-name
    try-expand-all-abbrevs
    try-expand-list
    try-expand-line
    try-expand-dabbrev
    try-expand-dabbrev-all-buffers
    try-expand-dabbrev-from-kill
    try-complete-lisp-symbol-partially
    try-complete-lisp-symbol)

Because hippie-expand uses try-expand-dabbrev-* as one of its completion techniques, its completions are a strict superset of the completions that dabbrev would have suggested. So it is a pretty good drop-in replacement for dabbrev. In addition to looking for words in other buffers, it will also fill in filenames, entire lines of files, lisp symbols, and words in the kill ring.

Improving rename order in wdired

After using Emacs' wdired for some heavy-duty work, I noticed a flaw in how it does renaming.

Some background for those who do not know what wdired is (I find it indispensable!): wdired gives you a view of a directory that looks like the output of ls -l. However, you are allowed to edit the filenames. When you "save" the buffer, wdired renames all the files whose names you have changed. (More information: a blog post about wdired; Emacs manual node for wdired)

Here's the problem:

wdired always performs renames in a fixed order, starting from from the bottom of the buffer and going up. You can easily construct sets of renames where wdired unnecessarily thinks it has to clobber a file because it is doing the renames in the wrong order.

I improved wdired-finish-edit so that it does renames in the "right" order. I've posted the improved version on my web site.

While this does complicate the implementation a bit, the apparent model that wdired presents is that all the renames you ask for happen simultaneously, so I believe there is no reason when this new behavior would be inappropriate, assuming the code is not buggy.

Installing Debian the hard way is still easy

I prefer Ubuntu in general, but one thing that Debian has really nailed is installation. Last week I installed Debian on an old machine using no removable media other than a corrupted Ubuntu installation CD.

Under Debian's hard disk booting installation method, you download two files (a kernel and a disk image) to your disk, which are under 6MB in total. Then you ask grub to boot the kernel with the specified disk image. There is enough magic in there to launch a Debian installer that downloads all the packages it needs from the internet.

All you need to do is get those two files onto the disk. Easy ways to do this include: booting from a liveCD (or another functioning OS on the disk) and downloading them, or ripping out the disk and connecting it to another computer. Unfortunately, I did not have a good OS on the disk, nor a working liveCD, nor a PATA dongle.

The disk I was using already had grub installed. The Ubuntu installation CD got as far as formatting the drive, but couldn't install any packages because they were all corrupted. Fortunately, there is a recovery shell which includes, among other things, wget. That was enough to get the ball rolling for a successful Debian install.

Vulnerability in Debian's OpenSSL revealed

A weakness has been discovered in implementation of OpenSSL that Debian and Ubuntu provide. This random number generator has been shown to be predictable in certain ways. Consequently, encryption keys generated by OpenSSL, including SSH host keys and SSH public/private keypairs, should be considered compromised. (Upgrading to the latest version of openssl in Debian and Ubuntu will offer to regenerate your host keys.)

What is interesting is how this vulnerability was created in the first place. In order to create keys, OpenSSL acquires randomness from a bunch of sources and adds it to a buffer created in uninitialized memory.

Valgrind (a debugging/profiling tool) detects, among others, situations where programs do computations based on the results of uninitialized memory. These are almost certainly bugs. Except when the express goal of your program is to produce something random.

A Debian developer added the following patch to OpenSSL,

+       /* Keep valgrind happy */
+       memset(tmpbuf, 0, sizeof tmpbuf);
+

thereby replacing perfectly good semi-random data with zeroes. As it turns out, this is enough to greatly reduce the key search space for attackers.

Diagnostics (and compiler warnings, and the like) can be dangerous when interpreted by amateurs.

Copying directory trees with rsync

You can use cp -a to copy directory trees, but rsync can do the same and give you more flexibility. rsync supports a syntax for filter rules which specify which files and directories should and should not be copied.

Examples

Copy only the directory structure without copying any files:

$ rsync -a -f"+ */" -f"- *" source/ destination/

The two -f arguments mean, respectively, "copy all directories" and then "do not copy anything else".

Copy only directories and Python files:

$ rsync -a -f"+ */" -f"+ *.py" -f"- *" source/ destination/

This is really handy for replicating the general directory structure but only copying a subset of the files.

Copy everything but exclude .git directories:

$ rsync -a -f"- .git/" -f"+ *" source/ destination/

Conclusion

Of course, rsync also works great for copying files between machines, and it knows better than to transfer files that already exist on the destination. I use something similar to the above to do backups, copying my homedir but excluding stuff like caches that are not even worth copying.

Network transparency makes distance irrelevant

The recent version of Ubuntu and the upcoming version of Fedora both ship with PulseAudio, a sound server which supports, among other things, network-transparent operation: you can take any program that generates sound and redirect that sound to be played on any other machine with a PulseAudio server.

PulseAudio is part of a long and venerable history of using computers remotely. People have been using SSH and X11 forwarding for ages now. CUPS and SANE allow you to access a printer or a scanner from halfway around the world. x2x lets you move the cursor on one computer with the mouse on another. And SSH's port-forwarding feature builds bridges to enable pretty much any service to be used from anywhere, even if the server is behind a firewall or is only serving to localhost. Features like these aren't unique to the modern Unix-like operating systems, but it is only there that they are widely enough used that people actually rely on them to get things done. Perhaps more critically, it is only there that they are easy enough to configure that people can use them on an ad-hoc basis.

In contrast, Windows or Mac OS, users are very much tied to individual machines. It does not really occur to people that they could use more than one computer, or that they would even want to.

I think this reflects one of the philosophical differences between Unix-like operating systems and others. Under Unix, a computer provides a number of resources (such as software, data, computation capacity, or specialized hardware) from which you can pick and choose. Because of network transparency, physical proximity is unimportant. Channels of communication can be layered and rerouted arbitrarily. You can make computers work together as ensembles. Individual computers are fungible.

Just a few weeks ago, for my research work, I wrote a script to distribute the processing of hundreds of files among a bunch of free machines. SSH is set up with public-key authentication there, so all the distribution of work happens without human intervention. The whole thing, including parsing the input, distributing those jobs, and load balancing, is about 50 lines of Python, using software which is standard on most any actual GNU/Linux system.

I do not think it is accidental that this kind of flexibility started in OSes built on free software. People who are trying to sell you software licenses do not have as much incentive to allow individual computers to be used in whatever way you please.

Python is a great teaching language... for TAs

Repetitive tasks put the brain in a state where it is more likely to make mistakes, say scientists. I've been grading CS problem sets for 6.00 and I can personally attest to that. The drudge work in this class (for the TA, anyway) usually consists of verifying that the students' solutions produce expected behavior on a large set of test cases. I want to make the computer work for me as much as is possible here. Less drudgery for me means my students get more helpful high-level comments on their psets. Writing good tests takes time, but I would have spent more time testing and checking (and then second-guessing my error-prone self: "Did I grade that last student's code correctly...").

The conclusion I have come to in the last couple of terms is that Python is a great language to be teaching. The built-in unittest module makes organizing and writing tests, and reading their output, really easy.

However, all is not well, yet: non-determinism is the tester's constant foe! And Python, with its myriad libraries, lets you pull in code from pretty much anywhere you please. So much for having controlled and deterministic environments. The textbook example here of non-determinism is the time module.

Usually (read: in Java), people would deal with this using dependency injection or related techniques: basically, writing a function which takes the non-deterministic part as an argument. It works great, but cluttering the interfaces and specifications in this way is not something we really care to inflict upon fledgling programmers.

The great thing about Python is that you can patch pretty much anything at runtime. When combined with the fact that Python doesn't enforce data-hiding anywhere, this means that we can perturb and control the execution of students' code in some very interesting ways.

The most important effect of this is that it totally obviates the need for user interaction. Need to provide input to a student code? Just replace raw_input with a callable of your choice. Need to read printed output programmatically? Just replace sys.stdout. In Python, every module is really a dictionary to which you can add items whenever you like, so patching just looks like this:

import ps5

# Arbitrary callable that should return input to program under test
ps5.raw_input = fake_input
# Save the real stdout for later replacement
old_stdout = sys.stdout
# Redirect stdout to a string buffer
sys.stdout = StringIO.StringIO()

It's just up to me to write a callable (possibly with state) that provides the correct input(s), and some code to process the contents of the buffer, and then I can grade most assignments without running the student's code by hand.

I can even replace module-level "constants", or even entire libraries that the student relies on:

import ps5

ps5.time = fake_time

For example, fake_time might be a class instance that implements a time method, if we expect students to be calling time.time(). So we can run tests where the student's code has to measure the passage of time, and the test doesn't have to take some large constant amount of time to run. When the students are learning to use matplotlib, we have, for TA use, a fake matplotlib module that, when asked to produce a plot, actually conveniently saves that plot to disk.

By replacing dependencies at runtime, we can test pretty much any functional aspect of student code without requiring students to conform to some (apparently) unnatural interface in advance. The students are blissfully unaware of testability issues until we choose to introduce those issues.

We can also test many non-functional requirements. For example, suppose function A is supposed to run without calling function B for efficiency reasons: before calling A, just replace B with a wrapper that notices when it's called.

It's the 21st century. TAs should not be meticulously entering test inputs into computers and cataloguing the outputs. We have computers to do that for us.

Amazing Graphics Papers: Dual Photography

This is a summary of the technique in the paper Dual Photography, which was presented at SIGGRAPH 2005. What the authors managed to do is very clever, and you can even understand the technique without knowing much high-brow math. Color me impressed.

Imagine you are photographing a scene which contains objects lit by a lamp. Suppose the light source is replaced with a structured light source, basically a projector. We could turn all the pixels of the projector on and get what is essentially a lamp. But in general we could light just some of the projector pixels, and illuminate the scene with whatever weirdly shaped light we want.

Fact: light transport is linear. So the relationship between the input vector p (which pixels on the projector are lit) and the output vector c (the brightness on each camera pixel) can be described by a matrix multiplication: c = T p. The ij^th element of the matrix T is the brightness of the i^th camera pixel when the j^th projector pixel (and nothing else) is lit with intensity 1. (If the camera and the light source both have resolution of, say, 10³×10³, then p and c both have length 10⁶, and T is a 10⁶×10⁶ matrix.)

We can actually construct the matrix T by lighting the first pixel, taking a picture, lighting the second pixel, taking a picture, etc. (each picture yields one column of the matrix T). And once we've done this, we can plug in any vector p we want to see what our scene would look like with arbitrary illumination from the projector. So, once we have this matrix T which completely characterizes the response of this scene to lights, we can change the lighting of the scene in post-processing.

That's already kind of neat, but the most impressive trick here is based on the fact that light transport follows the principle of reciprocity. If light of magnitude 1 enters a scene from a projector pixel a and lights up camera pixel b with intensity α, then a ray of light could just as well have entered at b and exited at a, and that light will also be passed with the same coefficient α. This is easily seen if the scene contains mirrors, but it's actually true in general, even if light is partially absorbed, reflected in funny ways on the scene, etc.

Now, what happens if we swap the locations of the projector and the camera in our scene? What's the matrix T' associated with this new scene? Because of reciprocity, it's merely the transpose of the matrix T: all the coefficients are the same, but they move around because we've swapped the 'input' and the 'output' of our system.

You'll notice something funny about this, which is that using T', we can reconstruct the scene as if it were viewed by a camera where the projector was, and lit by a projector where the camera was. The authors show that this actually works, and describe some tricks they played to get it working well. See the pictures below: the second, third, and fourth pictures can all be synthesized from the matrix T, despite the fact that the scene was never photographed from that angle.

The authors then use this technique to (I am not making this up) read the back of a playing card. Watch the video (linked from the web page), then read the paper.

The approach works no matter what the resolutions of the camera and projector are. You can even take a picture using a projector and a photodiode (i.e. a single element light sensor), because the effective resolution of the virtual camera is the resolution of the projector.

A web hosting recommendation: NFSN

This year I started hosting my personal web site with nearlyfreespeech.net. I have nothing but the best to say about them.

NFSN is a virtual hosting service, which means that small sites don't pay for capacity they aren't using, and large sites are automatically load-balanced. NFSN-hosted sites regularly get links from Slashdot, Digg, etc. and they stay up.

Pricing: NFSN is pay-as-you-go: bandwidth starts at $1/GB and goes down based on your usage, and storage is $0.01/MB/month. I'm paying less than $7/month for my site, and most of that only because I'm running a web app that's a derivative of Wikipedia. No stupid pricing tiers here.

Environment setup: you can set up your shell for public-key access over SSH. Their environment comes with Emacs 22 and git installed, which means that, for once, my web host shell is actually useful and not just a pale shadow of a real working environment. (Yes, vim 7.1, bzr, hg, SVN, and CVS are installed too.)

Abuse policy: here is a clip from their abuse FAQ:

A NearlyFreeSpeech.NET member site is defaming me or otherwise injuring me civilly.

Please forward a copy of your legal finding from a court of competent jurisdiction to our contact address. If you have not yet obtained such a finding, a preliminary injunction or court order is also sufficient.

The rest of it reads similarly. It is nice that there are web hosts out there that respect their customers, even when those customers don't have absurdly priced SLAs.

Making backups (instructions for GNU/Linux)

The cardinal rule of backing up is: assume that any one of your hard drives could go up in smoke at any moment. Zap. Could be right now. Magnetic disks will fail, and it's not a matter of if, but of when. (Backups will also save you from some user error, although that is not their primary purpose.)

Once you are thoroughly convinced of that, you will be nearly paranoid enough to implement a good backup strategy. This may seem awfully depressing, but the great news is that storage is cheap these days. As of this writing, you can get a 500GB hard disk for under $100. Having a full backup of your files when your primary disk breaks into tiny pieces is worth a lot more than $100. I've lost three disks in the last three years. In no case did I lose any data.

I am not willing to mess with the hardware or software needed to configure a RAID, so my backup solution (based on jwz's PSA) involves a second hard disk and rsync on a cron job.

Whenever I get a new hard disk for a computer, I generally make it the primary disk (containing /, /home, and swap) and graduate the previous disk to the role of a backup and swap disk. The backup partition is formatted with ext3, just like my root and home partitions. Suppose that the backup partition on the second disk is mounted at /media/sdb1 and I want to backup my homedir to /media/sdb1/phil. (My system is pretty much a stock Ubuntu install, so there is little value in backing up stuff outside of /home.)

The following script, archive-homedir, rsyncs my homedir to the backup disk:

rsync -vaxE --delete --ignore-errors /home/phil/ /media/sdb1/phil/ touch /media/sdb1/phil/last-backup

The options basically mean: print the files being copied; preserve timestamps and permissions; don't descend into other filesystems mounted under your homedir; delete files in the backup when they get deleted on the main partition; and ignore errors. The script also touches a file so you can see at a glance when the last backup was made.

This crontab file backup.crontab causes a backup to happen every day at 6AM:

0 6 * * * /path/to/archive-homedir

Install and activate it with crontab /path/to/backup.crontab. If your cron is configured to email you with the job output, you will get the list of files that was backed up every morning. Watch out for messages telling you that your disks could not be read or written. These generally mean that you need a fsck or a new disk.

I use the same kind of setup to back up my laptop disk over the network. On my laptop, archive-homedir looks like this:

rsync -vaxE --delete --ignore-errors --delete-excluded \ --filter="merge /path/to/archive-exclude-patterns" \ /home/phil/ desktop:/media/sdb1/laptop/

where archive-exclude-patterns is a file with a list of filter rules that instruct rsync to include or exclude certain files. I use this file to tell rsync not to back up some files that are not worth transferring over the network, like my web browser cache. My archive-exclude-patterns looks like this:

- /.local/share/tracker/ - /.local/share/Trash/ - /.mozilla/firefox/*/Cache/ - /.thumbnails/

On my laptop, I don't run archive-homedir on a cron job, but I do run it whenever I'm on the same LAN as my desktop.

Emacs in Ubuntu Hardy now has anti-aliased fonts

Update, 7 August 2009: the most recent major release of Emacs (v. 23.1) now has the anti-aliased font support. See the Ubuntu elisp PPA, which contains packages for any recent Ubuntu release, or see installation instructions for various other platforms.

The latest emacs-snapshot-gtk packages in Ubuntu Hardy (1:20080228-1) have the Unicode/font changes merged, and now support anti-aliased fonts for the first time.

While I didn't really mind the old bitmap fonts, I have to say that anti-aliasing is gorgeous.

To activate the new fonts, I added the following line to my ~/.Xdefaults:

Emacs.font: Monospace-8

Then, I ensured that the settings in .Xdefaults are being loaded by adding the following to my ~/.xsession:

if [ -f $HOME/.Xdefaults ]; then xrdb -merge $HOME/.Xdefaults fi