A use for the scalar reverse (maybe)

The reverse operator, which turns a list end to front, has a scalar context too. It’s one of the examples I use in my Learning Perl classes to note that you can’t guess what something does in context. I’ve never had a decent example for a proper use, but flipping a string around to effectively scan from the right seems interesting.

Perl, being a language that tries to do the most common thing you probably want, uses context to achieve that sometime. For instance, in list context, the localtime returns a list of time components, but in scalar context it returns a date-time string:

$ perl -le 'print join " ", localtime'
26 9 13 1 0 115 4 0 0

$ perl -le 'print scalar localtime'
Thu Jan  1 13:09:01 2015

Throwing scalar in front of the localtime sets its context even though I’m using it with the list operator print

I don’t know if that’s the common thing, but we already have time and gmtime to return a number of seconds.

This brings us to reverse. In scalar context it flips the string around:

$ perl -le 'print reverse( 1 .. 5 )'
54321

$ perl -le 'print scalar reverse("Hello Perl!")'
!lreP olleH

For as long as I’ve been teaching Perl, I haven’t had a good explanation for the usefulness of that scalar behavior. At some long-ago OSCON, I was alone in an elevator with Larry Wall, and having him trapped like that, asked him about reverse‘s scalar context. He said he couldn’t remember why he made it like that, but “it seemed like a good idea at the time.”

Today, the first day of the New Year, I was looking at a list of my current CPAN distributions:

Acme-BDFOY-0.01
Apache-Htaccess-1.4
Apache-iTunes-0.11
App-Module-Lister-0.15
App-PPI-Dumper-1.02
Brick-0.227_01
Bundle-BDFOY-20070101
Bundle-MasteringPerl-20070706
Business-ISBN-2.09
Business-ISBN-Data-20140910.002
Business-ISMN-1.13
Business-ISSN-0.91
Business-US-USPS-WebTools-1.11
CACertOrg-CA-20110724.004
...

I wanted to split that list into the distribution name and version. I want to break it up into two parts based on the last dash. The split takes an optional third argument to specify the maximum number of parts.

$ perl -le 'print join " ", split /:/, "1:2:3:4:5"'
1 2 3 4 5

$ perl -le 'print join " ", split /:/, "1:2:3:4:5", 3'
1 2 3:4:5

$ perl -le 'print join " ", split /:/, "1:2:3:4:5", 2'
1 2:3:4:5

I could have done this with Graham Barr’s CPAN::DistnameInfo, which handles many special cases, but I was thinking about a more general problem, like breaking up a filename that has multiple dots (which File::Basename can do with some hassle).

The limited split works from the left, like all regular expressions. If I limit it to two parts, it breaks on the first dash on the left. I want to limit it to two parts but break on the last dash.

If I use reverse in scalar context, I can turn the string around so the last dash becomes the first:

my $string   = 'Log-Log4perl-Appender-ScreenColoredLevels-UsingMyColors-0.10_01';
my $reversed = reverse $string;
my ($version, $name ) = split /-/, $reversed, 2;
my $module = reverse $name;

If I reduce that, I get something that’s a bit hard to read. Remember that +( is in there so reverse doesn’t think that the ( starts its argument list:

my $module = reverse +( split /-/, reverse($string), 2 )[1];

This solves the problem in an interesting way, but that doesn’t mean that I should use it like that. There might be some problems that scanning from the right would simplify things, but after the joy of playing with reverse, I look for other ways. Perl is a “There’s more than one to do it”, but also, ”

There’s already to scan from the right with rindex, which we mention in passing in Learning Perl. We do use it in an example with substr. For this problem, I can extract the string up to the index of the last dash:

my $last_dash = rindex $string, '-';
my( $name ) = substr $string, 0, $last_dash;

A regex solution with a greedy quantifier is even easier in this particular case:

my( $regex ) = $string =~ /(.*)-/;

But this leads me to the start of another path, one which I won’t explore here, about lookbehinds, a type of zero width assertion (similar to regex anchors, but as a pattern). In Perl, a lookbehind has to be fixed length while a lookahead can be variable width. Not all regex engines have this limitation. However, flipping a string around can turn a lookbehind situation into a lookahead if you are willing to write the pattern backward. I don’t recommend doing this for anything other than fun, and failing that, only when you can’t find any other way to do it. It’s something for me to investigate later for Mastering Perl.

6 thoughts on “A use for the scalar reverse (maybe)”

  1. You’ve hit the nail on the head. Scalar reverse, as far as I can tell, is mostly used for string handling to simplify some split and regex problems.

    As a bonus, it helps with finding palindromes.

  2. I have a situation where I process a string a character at a time, with the possibility that I find what I need without processing the entire string:

    my $string = 'abcde';
    my $string_reverse = reverse $string;
    while ($string_reverse) {
      my $s = chop($string_reverse);
      # do test based on $s; exit loop using last if necessary
    }
    
  3. The scalar reverse is frequently used in molecular biology to obtain the reverse complement (or opposite strand) of a DNA sequence (a string of A, C, G, T). To get the reverse complement the DNA sequence must be reversed, and A, C, G, and T converted to T, C, G, and A, respectively.

    my $DNA = "ATAGCCGGGTGT";
    $DNA = reverse $DNA;
    $DNA =~ tr/ACGT/TGCA/;
    print "$DNA\n"; 
    
  4. # Return Excel column name for the specified zero-based column index.
    # For example, 0 => "A", 24 => "Y", 31 => "AF".
    
    sub col_name {
       my ($idx) = @_;         # zero-based column number
       my $name  = "";
       while ($idx >= 0) {
          $name .= chr(ord("A") + ($idx % 26));
          $idx   = int($idx / 26) - 1;
       }
       return scalar reverse $name;
    }
    
  5. I recently had occasion to use this. I had a set of column names that I needed to manually classify as keys or metrics. I could go down the list and decide individually on each variable, but there are common endings that are all classified similarly (fooID and barID are both keys, baz_count and quux_count are both metrics, etc.) Sorting by reversed string brings these suffixes together naturally, so when I see the first blahbytes column I can mark it as a metric and then keep marking them as metrics until I reach something other than bytes.

Comments are closed.