I Still Know What You Learned Last Summer

Showing posts with label Python. Show all posts

Two Python curiosities

I learn something new every day.

First, the sort of thing you could do to mess with someone when they step away from their Python interactive shell to go use the restroom:

>>> False = True
>>> False
True
>>> if False: print "Woo!"
... 
Woo!

Sure enough, True and False are just names in the builtin namespace that can be rebound. This capability seems pretty dangerous, so I'd expect to see something like it in C or in Lisp, but its presence in Python surprised me a bit.

Second, the following puzzle stumped me for a while when I first saw it. Why does the following expression evaluate as it does?

>>> "string" in [] == False
False

It's not a precedence issue, because that result is not consistent with either of the parenthesizations:

>>> ("string" in []) == False
True
>>> "string" in ([] == False)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: argument of type 'bool' is not iterable

It's actually a consequence of Python's support for chained comparison operators. That is, just as

5 < a <= 7

desugars to

5 < a and a <= 7,

the first expression above desugars to "string" in [] and [] == False, which is False.

It's good that the comparison operators (< <= > >= == != in is) work in this consistent way, but it can trip up Python novices who have not yet learned that they really have to write the more Pythonic expression "string" not in [] if they want to stay out of trouble.

Python is a great teaching language... for TAs

Repetitive tasks put the brain in a state where it is more likely to make mistakes, say scientists. I've been grading CS problem sets for 6.00 and I can personally attest to that. The drudge work in this class (for the TA, anyway) usually consists of verifying that the students' solutions produce expected behavior on a large set of test cases. I want to make the computer work for me as much as is possible here. Less drudgery for me means my students get more helpful high-level comments on their psets. Writing good tests takes time, but I would have spent more time testing and checking (and then second-guessing my error-prone self: "Did I grade that last student's code correctly...").

The conclusion I have come to in the last couple of terms is that Python is a great language to be teaching. The built-in unittest module makes organizing and writing tests, and reading their output, really easy.

However, all is not well, yet: non-determinism is the tester's constant foe! And Python, with its myriad libraries, lets you pull in code from pretty much anywhere you please. So much for having controlled and deterministic environments. The textbook example here of non-determinism is the time module.

Usually (read: in Java), people would deal with this using dependency injection or related techniques: basically, writing a function which takes the non-deterministic part as an argument. It works great, but cluttering the interfaces and specifications in this way is not something we really care to inflict upon fledgling programmers.

The great thing about Python is that you can patch pretty much anything at runtime. When combined with the fact that Python doesn't enforce data-hiding anywhere, this means that we can perturb and control the execution of students' code in some very interesting ways.

The most important effect of this is that it totally obviates the need for user interaction. Need to provide input to a student code? Just replace raw_input with a callable of your choice. Need to read printed output programmatically? Just replace sys.stdout. In Python, every module is really a dictionary to which you can add items whenever you like, so patching just looks like this:

import ps5

# Arbitrary callable that should return input to program under test
ps5.raw_input = fake_input
# Save the real stdout for later replacement
old_stdout = sys.stdout
# Redirect stdout to a string buffer
sys.stdout = StringIO.StringIO()

It's just up to me to write a callable (possibly with state) that provides the correct input(s), and some code to process the contents of the buffer, and then I can grade most assignments without running the student's code by hand.

I can even replace module-level "constants", or even entire libraries that the student relies on:

import ps5

ps5.time = fake_time

For example, fake_time might be a class instance that implements a time method, if we expect students to be calling time.time(). So we can run tests where the student's code has to measure the passage of time, and the test doesn't have to take some large constant amount of time to run. When the students are learning to use matplotlib, we have, for TA use, a fake matplotlib module that, when asked to produce a plot, actually conveniently saves that plot to disk.

By replacing dependencies at runtime, we can test pretty much any functional aspect of student code without requiring students to conform to some (apparently) unnatural interface in advance. The students are blissfully unaware of testability issues until we choose to introduce those issues.

We can also test many non-functional requirements. For example, suppose function A is supposed to run without calling function B for efficiency reasons: before calling A, just replace B with a wrapper that notices when it's called.

It's the 21st century. TAs should not be meticulously entering test inputs into computers and cataloguing the outputs. We have computers to do that for us.

for ... else in Python

Python has an interesting for statement (reference) which lets you specify an else suite.

In a construct like this one:

for i in foo:
  if bar(i):
    break
else:
  baz()

the else suite is executed after the for, but only if the for terminates normally (not by a break).

Here's some code written without for...else:

def contains_even_number(l):
  "Prints whether or not the list l contains an even number."
  has_even_number = False
  for elt in l:
    if elt % 2 == 0:
      has_even_number = True
      break
  if has_even_number:
    print "list contains an even number"
  else:
    print "list does not contain an even number"

The equivalent code snippet below illustrates how the use of for...else lets you remove an extraneous flag variable from that loop:

def contains_even_number(l):
  "Prints whether or not the list l contains an even number."
  for elt in l:
    if elt % 2 == 0:
      print "list contains an even number"
      break
  else:
    print "list does not contain an even number"

Use your good judgment when deciding whether to use the for...else construct. It's not unequivocally better, but when there's an asymmetry between the two possibilities, you can make your code more readable by using for...else to keep the "happy path" logic at the top and the exceptional/error case at the bottom.