January 2011 – Page 2 – Learning Perl

Updates to Chapter 12, “File Test Operators”

[This post notes differences between the fifth and sixth editions.]

This chapter probably doesn’t deserve an update here because almost nothing changed. Most of the updates is just make all the code examples consistent. When I added the Perl 5.10 updates for the stacked file test operators, I used a style that wasn’t quite my own, but not quite the one Tom and Randal had already used in the book. It’s more jarring in this chapter than in Chapter 15 (“Smart matching”), a completely new chapter in the fifth edition, because you can see two different styles on the same page. And, I’ve updated Chapter 15 too.

There is one area where I can use some feedback though. We say:

Don’t worry if you don’t know what some of the other file tests mean—if you’ve never heard of them, you won’t be needing them. But if you’re curious, get a good book about programming for Unix.

However, we don’t give any suggestions for what a good book might be. What would you choose?

“captures” versus “memories”, “group” versus “buffer”

The term “memories” to label the side effects of parentheses has fallen out of favor. The new hotness is “capture group”, although that has sometimes showed up as “capture buffer” in the documentation. Karl Williamson, however, purged the docs of “capture buffer”, so you shouldn’t see that anywhere in Perl 5.14’s docs. This mostly affects Chapter 8, where we introduce the match variables, even though we have grouping and backreferences in Chapter 7.

I’m not so sure I like “groups” everywhere though. I think that’s the right term to apply to the particular parentheses that triggered the capture, but not necessarily the thing actually captured. It’s the difference between asking which team is in the Super Bowl and who’s on the Super Bowl team.

I don’t really care that much, though, because there’s one overriding concern: we need to use the same terms that are in the documentation so people have the right search terms.

Perl 6 has a thing called captures, but that’s a completely different beast.

Regex classes under Unicode

This week in The Effective Perler, I posted about the oddness of character classes. In Know your character classes under different semantics”, I showed that the trusty character class shortcuts \w, \w, and \s that we know from the first edition aren’t the same thing now. In fact, they haven’t been the same thing since the fourth edition. As I’ve said before, we have basically ignored Unicode despite its support since Perl 5.6. Now we’re paying the Unicode tax; I just have to integrate this into the Learning Perl.

unichar, a small Unicode character test program

Re-writing Learning Perl to cover Unicode means I have to figure out how to type some of the characters that don’t show up on my keyboard. Not only that, I need to figure out their character names and code points for the examples. I want to convert from any of those (name, code point, character) to a description of the character. I want something like this:

$ perl unichar ã
Processing ã
		match       grapheme
		code point  U+00E3
		decimal     227
		name        LATIN SMALL LETTER A WITH TILDE
		character   ã

I wrote a short program I called unichar, which I have on github.

There are some interesting parts of the script (which might change since I’m still tinkering with it). Even though my locale is set to en_US.UTF-8 and the command-line arguments are UTF-8, the script still doesn’t see them that way so I have to decode them as UTF-8. The decode subroutine from Encode takes whatever I have and turns it into a UTF-8 string. In this case, I do that for all the elements of @ARGV:

use Encode qw(decode);
use I18N::Langinfo qw(langinfo CODESET); 

my $codeset = langinfo(CODESET);
@ARGV = map { decode $codeset, $_ } @ARGV;

There are some other interesting bits in there too, but they are a bit advanced for Learning Perl.

Here are some more examples of the output. I handle unprintable and invisible characters specially:

$ perl unichar 䣱
Processing 䣱
		match       grapheme
		code point  U+48F1
		decimal     18673
		name        
		character   䣱

$ perl unichar ↞
Processing ↞
		match       grapheme
		code point  U+219E
		decimal     8606
		name        LEFTWARDS TWO HEADED ARROW
		character   ↞

$ perl unichar U+2057
Processing U+2057
		match       code point
		code point  U+2057
		decimal     8279
		name        QUADRUPLE PRIME
		character   ⁗

$ perl unichar "TAMIL LETTER HA"
Processing TAMIL LETTER HA
		match       name
		code point  U+0BB9
		decimal     3001
		name        TAMIL LETTER HA
		character   ஹ

$ perl unichar 0x05d0
Processing 0x05d0
		match       code point
		code point  U+05D0
		decimal     1488
		name        HEBREW LETTER ALEF
		character   א

$ perl unichar "CYRILLIC CAPITAL LETTER I WITH GRAVE"
Processing CYRILLIC CAPITAL LETTER I WITH GRAVE
		match       name
		code point  U+040D
		decimal     1037
		name        CYRILLIC CAPITAL LETTER I WITH GRAVE
		character   Ѝ

$ perl unichar 0x9
Processing 0x9
		match       code point
		code point  U+0009
		decimal     9
		name        CHARACTER TABULATION
		character   

$ perl unichar 0x07
Processing 0x07
		match       code point
		code point  U+0007
		decimal     7
		name        BELL
		character

when(), Try::Tiny, and autodie

I’m working on Chapter 17, which is the catch-all chapter for topics we think that segue into the other books in the Learning Perl series. Although Mastering Perl has an entire chapter on catching and reporting errors, we want to at least survey the topic in Learning Perl.

The first edition of Learning Perl noted that eval existed and gave a couple of examples, and in each subsequent edition the discussion became more involved.

Starting with the fourth edition, we devoted a chapter to using Perl modules, acknowledging the fact that Perl’s greatest feature is CPAN. In that edition, it was fairly late in the book. In the fifth edition, we moved that chapter toward the middle of the book. In each case, this means that we can then use Perl modules, whether from the Standard Library or CPAN, for the rest of the book since we’ve covered the idea of using modules. Our goal is always to cover any topic before we use it.

Since the sixth edition also covers modules, we can use it when we talk about catching errors. That means that we can talk about autodie, the pragma that became part of the Perl core in 5.10.1, and Try::Tiny, which is not a core module. We might have covered autodie in the fifth edition, but we only covered up to Perl 5.10.0. Paul Fenwick just barely missed the cut-off.

So, while working on the eval section, I was playing with some examples. I covered eval, Try::Tiny, and autodie separately, but I was wondering what would happen if I combined them. Could try and autodie cooperate?

I started with the example from the autodie documentation:

eval {
   use autodie;
   open(my $fh, '<', $some_file);
   my @records = <$fh>;
   # Do things with @records...
   close($fh);
};

given ($@) {
   when (undef)   { say "No error";                    }
   when ('open')  { say "Error from open";             }
   when (':io')   { say "Non-open, IO error.";         }
   when (':all')  { say "All other autodie errors."    }
   default        { say "Not an autodie error at all." }
}

Then I added Try::Tiny and started playing around with it:

use 5.010;

use autodie;
use Try::Tiny;

my $filename = '/does/not/exist';
try {
  open my $fh, '>', $filename; # still dies on error
  }
catch {
  when( 'open'  ) { say 'Got an open error'; continue; }
  when( 'close' ) { say 'Got an open error'; continue; }
  when( ':io'   ) { say 'Got an io error';   continue; }
  };

The output is not what I expected. It does what it looks like it should do:

Got an open error
Got an io error

I was surprised that this worked, and that it worked without a warning. That when is outside an official topicalizer (something that sets $_, such as given or foreach ).

That’s one of the interesting parts of writing a book. To properly research something, we think about the different ways we might combine things and especially how things might break. When we’re teaching, we’re going to run into all sorts of crazy syntheses of the topics. With a bit of experience, we can anticipate some of that, and when we do we discover some interesting things like this.