Learning Perl Challenge: March Madness

Warren Buffet’s Berkshire Hathway is insuring Quicken Loans’ prize of $1 Billion dollars to someone who picks a perfect March Madness bracket and 20 prizes of $100,000 to the closet brackets. The rules won’t be enumerated until March 3, but so far they haven’t outlawed Garciaparra-ing by pulling a Nandor. If you want people to sit up and notice Perl, winning this contest with a Perl program will get you all the fame you want. You’ll be any job you want, but with $500 million (the present day value single payout), you won’t have to take it.


My 2006 March Madness picks

For this Learning Perl challenge, you have to create a your bracket with Perl. But, not only your bracket, but all other possible brackets the reduce 68 possible teams to one champion. How quickly can you make 100,000 brackets? How much disk space would you require? Although I typically limit these challenges to the material in Learning Perl, the bit vector chapter in Mastering Perl might be useful.

If you’re really motivated, you might taking seeding and ranking to make the more probable brackets first. How many do you think you could submit before they cut you off?

They haven’t announced how to enter, so that might be a later challenge. Remember in posting your programs, you’re giving other people the opportunity to submit your perfect bracket.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Google Buzz Send Gmail Post to LinkedIn Post to Reddit Post to Slashdot Post to StumbleUpon Post to Technorati

Captures with quantifiers match the last captured substring

A student in my Learning Perl class asked about what shows up in a capture when you apply a quantifier to that group. The great thing about computer programming is that you can just try it to find out:

my $_ = 'abc def ghi jkl';

if( /(\S+) \s+ (\S+\s+)+ (\S+)/x ) {
  print "
1: $1
2: $2
3: $3
";
  }
else {
  print "No match!\n";
  }

In that code, $2 comes from that capture sub pattern (\S+\s+), which can match one or more times. Does it get all the things it matched or just the last one? The output shows that it’s the last one:

1: abc
2: ghi 
3: jkl

The same thing happens for named captures, even those though are already designed to potentially remember many captures:

use Data::Dumper;
$_ = 'abc def ghi jkl';

if( /(\S+) \s+ (?<two>\S+\s+)+ (\S+)/x ) {
  print Dumper( \%-, \%+ );
  }
else {
  print "No match!\n";
  }

It only gets the last one too:

$VAR1 = {
          'two' => [
                     'ghi '
                   ]
        };
$VAR2 = {
          'two' => 'ghi '
        };

This isn’t documented in perlre or anywhere else that I could find.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Google Buzz Send Gmail Post to LinkedIn Post to Reddit Post to Slashdot Post to StumbleUpon Post to Technorati

Ruling the world with Perl and Excel

In Chapter 12 of Learning Perl, we have an exercise for people to practice using the file test operators. Our answer, which can only use the stuff we’ve covered in the book to that point, is simple:


foreach my $file (@ARGV) {
  my $attribs = &attributes($file);
  print "'$file' $attribs.\n";
  }

sub attributes { 
  my $file = shift @_; 
  return "does not exist" unless -e $file;

  my @attrib;
  push @attrib, "readable" if -r $file;
  push @attrib, "writable" if -w $file;
  push @attrib, "executable" if -x $file;
  return "exists" unless @attrib;
  'is ' . join " and ", @attrib;
  }

But I usually then redo this to show the same problem creating an Excel file with the same data. Most of you probably already know that the Universe runs on Excel. If you can programmatically create Excel files, you’re half way to controlling the Universe.

This still uses only what we cover in Learning Perl since we have a chapter on using modules, but there’s still a lot of fancy things I’m doing in there that I wouldn’t expect from someone in their first week with Perl.

Now the output is manager and executive friendly:

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Google Buzz Send Gmail Post to LinkedIn Post to Reddit Post to Slashdot Post to StumbleUpon Post to Technorati

The vertical tab now matches \s

Perl 5.18 added vertical tab (or LINE TABULATION in the UCS) to the characters that match the \s character class shortcut. It’s the one exception that made that shortcut different from the POSIX definition of whitespace. For the details, see my posts in The Effective Perler: The vertical tab is part of \s in Perl 5.18.

This means that a joke that I’ve used in almost 15 years of Perl teaching is going away. When I talked about \s, I’d quiz the class on how much whitespace is in ASCII. Most people can name three immediately and four if they think for a moment. Many people forget about form feeds, and only two people in all of my classes have ever mentioned the vertical tab. However, it was only after I wrote about the vertical tab in Know your character classes under different semantics that a student mentioned it. I’m destroying my own lines for a class by writing about this stuff!

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Google Buzz Send Gmail Post to LinkedIn Post to Reddit Post to Slashdot Post to StumbleUpon Post to Technorati

Why we teach the subroutine ampersand

In Learning Perl, we tell readers to use the & to prefix subroutine calls when we introduce the idea of reusable code. This doesn’t sit well with some programmers because it’s not how the experienced programmers work. The & does some magic, which we don’t mention in the book, and it’s a bit crufty for the Perl 5 programmer.

But newbies don’t work like experienced programmers. In general, newbies in anything don’t work like experienced practitioners. Trying to make them work the same way from Day One dumps too much information on them. It’s information overload and task loading. People will absorb only so much during an hour lecture or the first reading of a chapter (and we say as much in the preface where we explain we will lie a little). Learning Perl is decidedly not a reference; we give the reader enough information (and hide enough information) that they can pick up the essential and salient points.

I love sigils. They save me time because I don’t have to know the list of keywords and special names to choose a name for my variable. I remember my first day with Python. I tried to use a variable named count, but it conflicted with something I had loaded. It wasn’t the fault of the language; it was my knowledge of the language. It’s why Perl has sigils.

That applies to subroutines in Perl too. Randal likes to use the example of log, a natural name for a user-defined subroutine that outputs some debugging or progress information.

#!/usr/bin/perl

log( "Hey there!" );

sub log {
    print "[LOG] @_\n";
    }

This program has an odd error message:

Can't take log of 0 at test.pl line 3.

For the person in the middle of their first day of Perl, this doesn’t make sense. In my Learning Perl class, I don’t even talk about the log built-in. Math and science people figure it out quickly, but other people not so much.

There’s another beginner issue, though. We typically tell people to put subroutines at the end of the program. That means that the parser doesn’t see their definitions until after they are invoked. This doesn’t work but there’s no error message:

#!/usr/bin/perl

show_message;

sub show_message {
    print "Hello World\n";
    }

In that example, Perl doesn’t know what show_message is when it compiles that code. It hasn’t seen the subroutine definition yet. To get around that, we can prepend an & to let Perl know that it will get a definition eventually, and when it does to use that. We tell readers that the & gives the hint to the Perl parser.

#!/usr/bin/perl

&show_message;

sub show_message {
    print "Hello World\n";
    }

There are better ways to do this (I like show_message()), but I know from years of teaching that new people don’t quite understand argument lists, much less empty ones. Students frequently try to invoke the subroutine name without a hint. That’s the beauty of Learning Perl: it’s classroom tested with 20 years of direct student testing behind it.

We tell the beginner to use the & to invoke a subroutine until they get to know the Perl built-in names. That’s the important thing that people miss when they complain about our use of &: They ignore that we told them this is an expedient use to avoid a class of newbie problems. It’s one of the problems producing new Perl programmers: other programmers yell at them for not working like an experienced programmer.

This code, with the &, works as the first day programmer would program it:

#!/usr/bin/perl

&log( "Hey there!" );

sub log {
    print "[LOG] @_\n";
    }

There’s another reason that we teach the & sigil though. We’ll need it later in Intermediate Perl when we want to dereference a subroutine reference in the same way that we did for scalar, array, and hash references where we use the sigils in some cases. Most of our class focuses on consistency and elucidation of the patterns hidden in the syntax.

print $$scalar
keys %{ $hash }
foreach ( @$array )
defined &$code_ref;

Readers need to be aware of the & for Intermediate Perl and they need to be comfortable for it for Mastering Perl‘s subroutine jury-rigging. And there’s a progression there. Each book adds to the reader’s understanding as they spiral above a topic. You don’t learn a language once; you learn what you need and augment that. Sometimes you re-learn things, but only when you are ready for that. The process continues for your entire career.

There’s a third reason we do this, though. We aren’t only teaching people to write new code. We’re also teaching them to read old code. They are going to see that & in old code. They are certainly going to see it in Perl 4 code which requires the & (and Perl 4 code still out there). There’s quite a bit of legacy code out there and people are forced to work with it.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Google Buzz Send Gmail Post to LinkedIn Post to Reddit Post to Slashdot Post to StumbleUpon Post to Technorati