Repetitive tasks put the brain in a state where it is more likely to make mistakes, say scientists. I've been grading CS problem sets for 6.00 and I can personally attest to that. The drudge work in this class (for the TA, anyway) usually consists of verifying that the students' solutions produce expected behavior on a large set of test cases. I want to make the computer work for me as much as is possible here. Less drudgery for me means my students get more helpful high-level comments on their psets. Writing good tests takes time, but I would have spent more time testing and checking (and then second-guessing my error-prone self: "Did I grade that last student's code correctly...").
The conclusion I have come to in the last couple of terms is that Python is a great language to be teaching. The built-in unittest module makes organizing and writing tests, and reading their output, really easy.
However, all is not well, yet: non-determinism is the tester's constant foe! And Python, with its myriad libraries, lets you pull in code from pretty much anywhere you please. So much for having controlled and deterministic environments. The textbook example here of non-determinism is the time module.
Usually (read: in Java), people would deal with this using dependency injection or related techniques: basically, writing a function which takes the non-deterministic part as an argument. It works great, but cluttering the interfaces and specifications in this way is not something we really care to inflict upon fledgling programmers.
The great thing about Python is that you can patch pretty much anything at runtime. When combined with the fact that Python doesn't enforce data-hiding anywhere, this means that we can perturb and control the execution of students' code in some very interesting ways.
The most important effect of this is that it totally obviates the need for user interaction. Need to provide input to a student code? Just replace raw_input with a callable of your choice. Need to read printed output programmatically? Just replace sys.stdout. In Python, every module is really a dictionary to which you can add items whenever you like, so patching just looks like this:
import ps5 # Arbitrary callable that should return input to program under test ps5.raw_input = fake_input # Save the real stdout for later replacement old_stdout = sys.stdout # Redirect stdout to a string buffer sys.stdout = StringIO.StringIO()
It's just up to me to write a callable (possibly with state) that provides the correct input(s), and some code to process the contents of the buffer, and then I can grade most assignments without running the student's code by hand.
I can even replace module-level "constants", or even entire libraries that the student relies on:
import ps5 ps5.time = fake_time
For example, fake_time might be a class instance that implements a time method, if we expect students to be calling time.time(). So we can run tests where the student's code has to measure the passage of time, and the test doesn't have to take some large constant amount of time to run. When the students are learning to use matplotlib, we have, for TA use, a fake matplotlib module that, when asked to produce a plot, actually conveniently saves that plot to disk.
By replacing dependencies at runtime, we can test pretty much any functional aspect of student code without requiring students to conform to some (apparently) unnatural interface in advance. The students are blissfully unaware of testability issues until we choose to introduce those issues.
We can also test many non-functional requirements. For example, suppose function A is supposed to run without calling function B for efficiency reasons: before calling A, just replace B with a wrapper that notices when it's called.
It's the 21st century. TAs should not be meticulously entering test inputs into computers and cataloguing the outputs. We have computers to do that for us.