Learning Perl Challenge: rhich (Answer)

Here’s my answer for the first Learning Perl Challenge, where I asked people to create a program like `which` but takes a pattern as an argument.

This is a program I had already written. As I recall, I wrote it at the 2011 Perl QA Workshop in Amsterdam. I’d wanted the program for a long time and finally broke down to write it.

My answer (I give myself a B+)

Here’s the program I wrote. I didn’t clean this up or otherwise hide any ugliness. That I did something in any particular way isn’t an endorsement for that technique. This is something I spent five minutes on, and, once it worked, stopped thinking about it:

use 5.012;

use File::Spec;

my $regex = $ARGV[0];

my @paths = get_path_components();

foreach my $path ( @paths ) {
	if( ! -e $path ) {
		warn "path $path does not exist\n";
		next;
		}
	elsif( ! -d $path ) {
		warn "path $path is not a directory\n";
		next;
		}
	elsif( opendir my $dh, $path ) {
		my @commands =
			map     {
				if( -l ) {
					my $target = readlink;
					"$_ -> $target";
					}
				else { $_ }
				}
			grep    { -x }
			map     { File::Spec->catfile( $path, $_ ) } 
			grep    { /$regex/ } 
			readdir $dh;
		
		next unless @commands;

		print join "\n", @commands, '';
		}
	else {
		warn "Could not read directory for $path: $!\n";		
		}
	}
	
sub get_path_components {
	use Config;
	my $separator = $Config{path_sep} // ':'; #/
	my @parts = split /$separator/, $ENV{PATH};
	}

The heart of the program is the last elsif, which is a long list pipeline:

			map     {
				if( -l ) {
					my $target = readlink;
					"$_ -> $target";
					}
				else { $_ }
				}
			grep    { -x }
			map     { File::Spec->catfile( $path, $_ ) } 
			grep    { /$regex/ } 
			readdir $dh;

Read these from the bottom up. I use readdir to get the list of files in the current path and filter that using the regex. Notice that I have a problem: I never checked that pattern for validity. I should have done that earlier.

The bottom map puts that filename together with the path using File::Spec, and the grep right above it checks for paths that are executable (which wasn’t an explicit requirement).

The top map looks for symbolic links so it can modify the string to show the line target. That wasn’t part of the challenge either.

My output looks like:

% rhich perl
/usr/local/bin/perl-5.10.0 -> /usr/local/perls/perl-5.10.0/bin/perl
/usr/local/bin/perl-5.6.2 -> /usr/local/perls/perl-5.6.2/bin/perl
/usr/local/bin/perl-5.8.9 -> /usr/local/perls/perl-5.8.9/bin/perl
/usr/local/bin/perl5.10.1 -> /usr/local/perls/perl-5.10.1/bin/perl5.10.1
/usr/bin/cpanp-run-perl
/usr/bin/cpanp-run-perl5.10
/usr/bin/cpanp-run-perl5.12

Answers from others

I won’t reproduce all of the answers here since you can see them in the comments for the original post.

Anonymous Coward, score: B+

The first response, from Anonymous Coward, fixes one thing I did not do. This person checks that the file is not a directory, which can also be executable:

      if ($file =~ m/\Q$ARGV[0]\E/) {
        say STDOUT "$path/$file" if -x "$path/$file" and not -d _;
      }

However, this solution fails because the pattern in nerfed by the \Q, which literalizes all of the regex metacharacters. Patterns such as perl[56] won’t work unless that’s a literal substring in program name. Besides that small mistake, the program is fine.

Scott

Scott’s answer is fine for a Perl beginner, as he admits. He does something I hadn’t thought to do by disallowing multiple arguments:

die 'Usage: rhich [filename]' if @ARGV != 1;

It might be interesting to have multiple patterns, which are sometimes easier than one complicated pattern. I would have added an newline to the end of that die string so I don’t have to see the file and line number information.

He could use File::Spec, but that’s not a huge deal if the structure of the program is right:

      print $path."/".$_,"\n";  #Prints out the FILENAME and PATH

Jack Maney, score: B

Jack added several features to his solution. He supports the -a switch so his solution acts more like which, which normally stops at the first match.

use Getopt::Long qw(:config pass_through);

my $all=0;

GetOptions('a'=>\$all);

He sets up a hash to store his finds:

my %output;
$output{$_}=[] foreach(@ARGV);

The meat of the program is in this little bit. If the path matches, it goes into the hash:

        foreach my $pattern (@patterns_to_check)
        {
            if($file=~/$pattern/)
            {
                push @{$output{$pattern}},$dir . $dir_sep . $file;
                unless($all)
                {
                    @patterns_to_check=grep{$_ ne $pattern}@patterns_to_check;
                }
            }
        }

There’s a couple of opportunities for improvement here. First, the $dir . $dir_sep . $file is fragile. It assumes more about the filesystem than we really know. Simply gluing strings together with a separator might not always work. I don’t think that many people are using VMS though, so maybe it doesn’t matter.

The grep confused me for a bit until I realized what it was doing (a comment might be nice). It takes out the pattern if we didn’t use -a to get all the matches. I still feel a bit weird about that. I don’t like changing aggregates as I’m iterating through them. I think I might have turned the structure inside out to have the patterns on the outside and the files on the inside then use last to skip the rest of the files once I found one.

Meir, score: B-

While most solutions checked only the filename for a a match, Meir’s answer matches against the entire path:

	foreach my $i (glob "$d/*")
	{
		print "$i\n" if ( $i =~ /${fl}/i );
	}

This means it might match the directory name. For instance, I have a ~bin/perls directory in my path, so if I search for “perl”, all programs in that directory, including things like xsubpp and splain, match. The problem there is the glob, whose return value includes the path according to the string argument. Give it an absolute and partial path, and those parts show up in its return value. If you want to keep using glob, you can chdir or strip off the directory. Neither of those are that attractive. Other than that minor problem, it’s the same basic structure as most answers.

Tudor Constantin, score: B

Tudor’s answer has the same problem as Meir, which he used as the basis for his answer, but he has a lot more going on in the list he gives to foreach:

$_ =~ /${fl}/i and print("$_\n") foreach ( map { glob( "$_/*" or '.' ) } split( $Config{path_sep},$ENV{PATH} ) ) ;

Kevin Biskar, score: B+

Kevin’s answer is like the previous two glob answers, but he fixes it with chdir. He didn’t include the hints about using Config for the path separator, and he constructs paths on his own:

my $pattern = \$ARGV[0];

foreach my $dir (split /:/, $ENV{PATH}) {
	chdir $dir;
	foreach my $file (glob '*') {
		if ($file =~ /$$pattern/) {
			print "$dir/$file\n"
		}
	}
}

The reference and dereference aren’t doing much there, but they aren’t really doing any harm either.

Although this program isn’t portable, it works on my system. The problems aren’t in the structure and are easy to fix, which is often a fine way to develop. You don’t paint yourself into any corners and the problems are local instead or global.

The Challenge winner

It was tough to pick a winner here. I really liked Anonymous Coward’s solution, but the program has a serious bug since it can’t actually search patterns. Kevin Biskar’s solution is good, but he missed some of the portability hints in the challenge. If I had to choose, I’d have to go with Kevin since his program works.

However, everyone who submitted is already ahead of the pack. I’m sure there are many people who never submitted their solutions, and many more people who never developed their own solution. It’s a tough thing to let everyone see what you tried.

The Aftermath

Taking some hints from everyone’s solutions, I modified my program:

#!/usr/local/perls/perl-5.14.2/bin/perl

use v5.12;
use strict;
use warnings;

use File::Spec;
use Regexp::Assemble;

my $ra = Regexp::Assemble->new;

while ( my( $index, $arg ) = each @ARGV ) {
	my $re = eval { qr/$arg/ };
	die "The pattern [$arg] at position $index is not a valid perl regex\n\t$@" if $@;
	$ra->add( $re );
	}
	
my $regex = $ra->re;
 
my @paths = get_path_components();

foreach my $path ( @paths ) {
	if( ! -e $path ) {
		warn "path $path does not exist\n";
		next;
		}
	elsif( ! -d $path ) {
		warn "path $path is not a directory\n";
		next;
		}
	elsif( opendir my $dh, $path ) {
		my @commands =
			map     {
				if( -l ) {
					my $target = readlink;
					"$_ → $target";
					}
				else { $_ }
				}
			grep    { -x -f and ! -d _ }
			map     { File::Spec->catfile( $path, $_ ) } 
			grep    { /$regex/ } 
			readdir $dh;
		
		next unless @commands;

		print join "\n", @commands, '';
		}
	else {
		warn "Could not read directory for $path: $!\n";		
		}
	}
	
sub get_path_components {
	use Config;
	my $separator = $Config{path_sep} // ':';
	my @parts = split /$separator/, $ENV{PATH};
	}

I use Regexp::Assemble to build up a single pattern based on all the patterns on the command line. Before I add each pattern, though, I build a regex with qr//. To go through command-line patterns, I use the Perl v5.12 version of each on @ARGV so I can get the position of each element with the value:

use Regexp::Assemble;

my $ra = Regexp::Assemble->new;

while ( my( $index, $arg ) = each @ARGV ) {
	my $re = eval { qr/$arg/ };
	die "The pattern [$arg] at position $index is not a valid perl regex\n\t$@" if $@;
	$ra->add( $re );
	}
	
my $regex = $ra->re;

Later, when I’m filtering paths, I check that I have a plain file and that it’s not a directory:

			grep    { -x -f and ! -d _ }

This might look a bit weird, but it’s using the Perl v5.10 stacked file test operators. The -x -f each operate on their default argument, $_. The ! -d uses the virtual filehandle _ that re-uses the fetched data from the previous file test.

I didn’t bother supporting -a, which my program already does. I’ve never been that curious about the first result if I’m searching by a pattern.

Learning Perl Challenge: rhich

This is the first of a set of novice challenges that I’ll present on this blog. I’ll give you some problem which you should be able to solve with just the stuff we present in Learning Perl. A week or so later, I’ll post a solution.

For your first problem, consider the which command-line tool, which given a name, tells you the where in your PATH it finds that program. With just a name, it tells you the first path it finds:

% which perl
/Users/brian/bin/perls/perl

With the -a switch, it shows all the paths:

% which -a perl
/Users/brian/bin/perls/perl
/usr/local/bin/perl
/usr/bin/perl

This command takes an exact name. What if you don’t quite remember what the name is? This is your challenge.

Write a tool, called rhich, which takes a pattern and searches through your PATH looking for any program name that matches that pattern. Print all matching paths.

Since we don’t cover PATH in Learning Perl, I’ll give you some hints.

PATH is a set of directory names that the shell uses to search for a command to run. On Unix-like shells, the components are separated by colons. On DOS-like shells, it’s separated by a semicolon. You don’t need to do much work for that because perl already knows what the separator is. The Config module provides a %Config hash with the things that perl knows based on its compilation:

use Config;

my $separator = $Config{path_sep};

With that, get to it. You don’t have to post your solution, but you can if you like.

You can see a list of all Challenges and my summaries as well as the programs that I created and put in the Learning Perl Challenges GitHub repository.