Learning Perl Challenge: rhich (Answer)

Here’s my answer for the first Learning Perl Challenge, where I asked people to create a program like `which` but takes a pattern as an argument.

This is a program I had already written. As I recall, I wrote it at the 2011 Perl QA Workshop in Amsterdam. I’d wanted the program for a long time and finally broke down to write it.

My answer (I give myself a B+)

Here’s the program I wrote. I didn’t clean this up or otherwise hide any ugliness. That I did something in any particular way isn’t an endorsement for that technique. This is something I spent five minutes on, and, once it worked, stopped thinking about it:

use 5.012;

use File::Spec;

my $regex = $ARGV[0];

my @paths = get_path_components();

foreach my $path ( @paths ) {
	if( ! -e $path ) {
		warn "path $path does not exist\n";
		next;
		}
	elsif( ! -d $path ) {
		warn "path $path is not a directory\n";
		next;
		}
	elsif( opendir my $dh, $path ) {
		my @commands =
			map     {
				if( -l ) {
					my $target = readlink;
					"$_ -> $target";
					}
				else { $_ }
				}
			grep    { -x }
			map     { File::Spec->catfile( $path, $_ ) } 
			grep    { /$regex/ } 
			readdir $dh;
		
		next unless @commands;

		print join "\n", @commands, '';
		}
	else {
		warn "Could not read directory for $path: $!\n";		
		}
	}
	
sub get_path_components {
	use Config;
	my $separator = $Config{path_sep} // ':'; #/
	my @parts = split /$separator/, $ENV{PATH};
	}

The heart of the program is the last elsif, which is a long list pipeline:

			map     {
				if( -l ) {
					my $target = readlink;
					"$_ -> $target";
					}
				else { $_ }
				}
			grep    { -x }
			map     { File::Spec->catfile( $path, $_ ) } 
			grep    { /$regex/ } 
			readdir $dh;

Read these from the bottom up. I use readdir to get the list of files in the current path and filter that using the regex. Notice that I have a problem: I never checked that pattern for validity. I should have done that earlier.

The bottom map puts that filename together with the path using File::Spec, and the grep right above it checks for paths that are executable (which wasn’t an explicit requirement).

The top map looks for symbolic links so it can modify the string to show the line target. That wasn’t part of the challenge either.

My output looks like:

% rhich perl
/usr/local/bin/perl-5.10.0 -> /usr/local/perls/perl-5.10.0/bin/perl
/usr/local/bin/perl-5.6.2 -> /usr/local/perls/perl-5.6.2/bin/perl
/usr/local/bin/perl-5.8.9 -> /usr/local/perls/perl-5.8.9/bin/perl
/usr/local/bin/perl5.10.1 -> /usr/local/perls/perl-5.10.1/bin/perl5.10.1
/usr/bin/cpanp-run-perl
/usr/bin/cpanp-run-perl5.10
/usr/bin/cpanp-run-perl5.12

Answers from others

I won’t reproduce all of the answers here since you can see them in the comments for the original post.

Anonymous Coward, score: B+

The first response, from Anonymous Coward, fixes one thing I did not do. This person checks that the file is not a directory, which can also be executable:

      if ($file =~ m/\Q$ARGV[0]\E/) {
        say STDOUT "$path/$file" if -x "$path/$file" and not -d _;
      }

However, this solution fails because the pattern in nerfed by the \Q, which literalizes all of the regex metacharacters. Patterns such as perl[56] won’t work unless that’s a literal substring in program name. Besides that small mistake, the program is fine.

Scott

Scott’s answer is fine for a Perl beginner, as he admits. He does something I hadn’t thought to do by disallowing multiple arguments:

die 'Usage: rhich [filename]' if @ARGV != 1;

It might be interesting to have multiple patterns, which are sometimes easier than one complicated pattern. I would have added an newline to the end of that die string so I don’t have to see the file and line number information.

He could use File::Spec, but that’s not a huge deal if the structure of the program is right:

      print $path."/".$_,"\n";  #Prints out the FILENAME and PATH

Jack Maney, score: B

Jack added several features to his solution. He supports the -a switch so his solution acts more like which, which normally stops at the first match.

use Getopt::Long qw(:config pass_through);

my $all=0;

GetOptions('a'=>\$all);

He sets up a hash to store his finds:

my %output;
$output{$_}=[] foreach(@ARGV);

The meat of the program is in this little bit. If the path matches, it goes into the hash:

        foreach my $pattern (@patterns_to_check)
        {
            if($file=~/$pattern/)
            {
                push @{$output{$pattern}},$dir . $dir_sep . $file;
                unless($all)
                {
                    @patterns_to_check=grep{$_ ne $pattern}@patterns_to_check;
                }
            }
        }

There’s a couple of opportunities for improvement here. First, the $dir . $dir_sep . $file is fragile. It assumes more about the filesystem than we really know. Simply gluing strings together with a separator might not always work. I don’t think that many people are using VMS though, so maybe it doesn’t matter.

The grep confused me for a bit until I realized what it was doing (a comment might be nice). It takes out the pattern if we didn’t use -a to get all the matches. I still feel a bit weird about that. I don’t like changing aggregates as I’m iterating through them. I think I might have turned the structure inside out to have the patterns on the outside and the files on the inside then use last to skip the rest of the files once I found one.

Meir, score: B-

While most solutions checked only the filename for a a match, Meir’s answer matches against the entire path:

	foreach my $i (glob "$d/*")
	{
		print "$i\n" if ( $i =~ /${fl}/i );
	}

This means it might match the directory name. For instance, I have a ~bin/perls directory in my path, so if I search for “perl”, all programs in that directory, including things like xsubpp and splain, match. The problem there is the glob, whose return value includes the path according to the string argument. Give it an absolute and partial path, and those parts show up in its return value. If you want to keep using glob, you can chdir or strip off the directory. Neither of those are that attractive. Other than that minor problem, it’s the same basic structure as most answers.

Tudor Constantin, score: B

Tudor’s answer has the same problem as Meir, which he used as the basis for his answer, but he has a lot more going on in the list he gives to foreach:

$_ =~ /${fl}/i and print("$_\n") foreach ( map { glob( "$_/*" or '.' ) } split( $Config{path_sep},$ENV{PATH} ) ) ;

Kevin Biskar, score: B+

Kevin’s answer is like the previous two glob answers, but he fixes it with chdir. He didn’t include the hints about using Config for the path separator, and he constructs paths on his own:

my $pattern = \$ARGV[0];

foreach my $dir (split /:/, $ENV{PATH}) {
	chdir $dir;
	foreach my $file (glob '*') {
		if ($file =~ /$$pattern/) {
			print "$dir/$file\n"
		}
	}
}

The reference and dereference aren’t doing much there, but they aren’t really doing any harm either.

Although this program isn’t portable, it works on my system. The problems aren’t in the structure and are easy to fix, which is often a fine way to develop. You don’t paint yourself into any corners and the problems are local instead or global.

The Challenge winner

It was tough to pick a winner here. I really liked Anonymous Coward’s solution, but the program has a serious bug since it can’t actually search patterns. Kevin Biskar’s solution is good, but he missed some of the portability hints in the challenge. If I had to choose, I’d have to go with Kevin since his program works.

However, everyone who submitted is already ahead of the pack. I’m sure there are many people who never submitted their solutions, and many more people who never developed their own solution. It’s a tough thing to let everyone see what you tried.

The Aftermath

Taking some hints from everyone’s solutions, I modified my program:

#!/usr/local/perls/perl-5.14.2/bin/perl

use v5.12;
use strict;
use warnings;

use File::Spec;
use Regexp::Assemble;

my $ra = Regexp::Assemble->new;

while ( my( $index, $arg ) = each @ARGV ) {
	my $re = eval { qr/$arg/ };
	die "The pattern [$arg] at position $index is not a valid perl regex\n\t$@" if $@;
	$ra->add( $re );
	}
	
my $regex = $ra->re;
 
my @paths = get_path_components();

foreach my $path ( @paths ) {
	if( ! -e $path ) {
		warn "path $path does not exist\n";
		next;
		}
	elsif( ! -d $path ) {
		warn "path $path is not a directory\n";
		next;
		}
	elsif( opendir my $dh, $path ) {
		my @commands =
			map     {
				if( -l ) {
					my $target = readlink;
					"$_ → $target";
					}
				else { $_ }
				}
			grep    { -x -f and ! -d _ }
			map     { File::Spec->catfile( $path, $_ ) } 
			grep    { /$regex/ } 
			readdir $dh;
		
		next unless @commands;

		print join "\n", @commands, '';
		}
	else {
		warn "Could not read directory for $path: $!\n";		
		}
	}
	
sub get_path_components {
	use Config;
	my $separator = $Config{path_sep} // ':';
	my @parts = split /$separator/, $ENV{PATH};
	}

I use Regexp::Assemble to build up a single pattern based on all the patterns on the command line. Before I add each pattern, though, I build a regex with qr//. To go through command-line patterns, I use the Perl v5.12 version of each on @ARGV so I can get the position of each element with the value:

use Regexp::Assemble;

my $ra = Regexp::Assemble->new;

while ( my( $index, $arg ) = each @ARGV ) {
	my $re = eval { qr/$arg/ };
	die "The pattern [$arg] at position $index is not a valid perl regex\n\t$@" if $@;
	$ra->add( $re );
	}
	
my $regex = $ra->re;

Later, when I’m filtering paths, I check that I have a plain file and that it’s not a directory:

			grep    { -x -f and ! -d _ }

This might look a bit weird, but it’s using the Perl v5.10 stacked file test operators. The -x -f each operate on their default argument, $_. The ! -d uses the virtual filehandle _ that re-uses the fetched data from the previous file test.

I didn’t bother supporting -a, which my program already does. I’ve never been that curious about the first result if I’m searching by a pattern.

3 thoughts on “Learning Perl Challenge: rhich (Answer)”

  1. Thank you for taking the time to critique the answers. I put together my initial solution in about 15 minutes and mostly stopped thinking about it, save for a few days ago when it occurred to me that it might look weird to change the number of elements of an array as you iterate through it. In retrospect, I’d probably either use an inside-out approach as you suggested or iterate through @ARGV in the outer loop and then include a “next unless(grep{$_ eq $pattern}@patterns_to_check;” to skip the iteration if the pattern has been removed.

    Also, you’re right about the string concatenation for the full file path. File::Spec is now on my radar.

    I’ve only been in industry for about 20 months (I was previously in academia), and before this job, I had no programming experience save for a few hours per week for about six weeks in teaching myself SQL. I started learning Perl about a year and a half ago. So, given all that, a B isn’t too shabby. 🙂 Thank you.

  2. Thank you, brian. Your first challenge and solution has motivated me to understand the grep, map and eval functions. I’ve taken the time to correct my solution and I can’t wait for the next challenge.

    #!/usr/bin/env perl
    
    use feature 'say';
    use File::Spec;
    use Config;
    use strict;
    use utf8;
    use warnings;
    
    die 'Usage: rhich RE' if @ARGV != 1;
    die "$ARGV[0] is not a valid regular expression" if not eval { qr/$ARGV[0]/ };
    
    my $separator = $Config{path_sep};
    my @paths = split /$separator/, $ENV{PATH};
    
    foreach my $path (@paths) {
      if (opendir my $dh, $path) {
        my @commands = grep -x && ! -d, map File::Spec->catfile($path, $_), grep /$ARGV[0]/, readdir $dh;
        print join "\n", @commands, '' if @commands;
        closedir $dh;
      } else {
        warn "Can't open $path: $!";
      }
    }
    

Comments are closed.