A wonderful abuse of regular expressions

As it turns out, you can use regular expressions to test primality.

You'll need a regular expression engine that supports backreferences to do this. (Intuitively, a more restricted matcher that can be implemented as a finite state machine cannot count up to arbitrarily large numbers.)

I tried this in GNU egrep. I made this input file, which contains a number on each line followed by that number of stars:

$ cat numbers.txt
 2 **
 3 ***
 4 ****
 5 *****
 6 ******
 7 *******
 8 ********
 9 *********
10 **********
11 ***********
12 ************
13 *************
14 **************
15 ***************
16 ****************
17 *****************
18 ******************
19 *******************
20 ********************

Then, to find only those lines corresponding to prime numbers:

$ egrep -v " (\*{2,})\1+$" numbers.txt
 2 **
 3 ***
 5 *****
 7 *******
11 ***********
13 *************
17 *****************
19 *******************

Intuitively, the regular expression identifies composite numbers by seeing whether they can be written as a group of two or more stars, followed by one or more repetitions of that group. Writing the line in such a way corresponds of finding a factorization of the corresponding number into two other integers, both of which are greater than 1. The implementation basically carries out trial division by letting the first group be equal to **, then to ***, ****, etc. The -v option to egrep only prints those lines which do not match the regular expression, that is, the prime numbers.

Here's the catch:

Back-references are very slow, and may require exponential time.

(man page for GNU egrep)

No comments:

Post a Comment