DOS pattern matching, in Perl

Perl’s file globbing uses the FreeBSD-style globbing, but it works mostly everywhere since Perl handles it internally through the File::Glob module. I’m working on the “Directory Operations” chapter for Learning Perl, 7th edition, where we cover glob. I’m trying to make the book more Windows friendly so I’ve been considering how this stuff translates.

I ran across Raymond Chen’s “How did wildcards work in MS-DOS?”. He lays out the steps for turning what we think of as a pattern (such as “*.txt”) into the CP/M-style pattern that MS-DOS used. He shows how to convert the glob pattern to primitive pattern.

  1. Initialize the target pattern to 11 spaces and set the cursor to 0.
  2. Read the next character from the input. Stop if there are no more characters.
  3. If the input is ., set positions 8 to 10 to spaces. Set the cursor to position 8 and go back to
    step 2.

  4. If the input is *, fill in the remaining places with ? (the CP/M wildcard). Go to position 11 and then start step 2.
  5. If the cursor is not at position 11, copy the input character to the cursor position and advance the cursor.

I translated this to Perl, just for fun. I used only one feature that we did not cover in Learning Perl—the use of the /g flag in scalar context. If that matches, it remember where it matched and picks up there the next time, allowing me to walk to the string in $glob without destroying it:

while(  ) {
	chomp;

	my $dos_pattern = ' ' x 11;
	my $cursor = 0;

	while( m/(.)/g ) { # /g in scalar content remembers where it left off
		my $input = $1;
		last unless defined $input;

		if( $input eq '.' ) {
			substr( $dos_pattern, 8, 3, ' ' x 3 );
			$cursor = 8;
			next;
			}
		elsif( $input eq '*' ) {
			my $rest = 11 - $cursor;
			substr( $dos_pattern, $_, 1, '?' ) for ( $cursor .. 10 );
			$cursor = 11;
			next;
			}
		elsif( $cursor != 11 ) {
			substr( $dos_pattern, $cursor++, 1 ) = $input;
			}
		}

	printf "%-12s -> %12s\n", $_, $dos_pattern;
	}

__END__
ABCD.TXT
ABCDEFGHIJK
A*B.TXT
*.*
*
*.TXT
.TXT

The output shows the translation of glob patterns:

ABCD.TXT     ->  ABCD    TXT
ABCDEFGHIJK  ->  ABCDEFGHIJK
A*B.TXT      ->  A???????TXT
*.*          ->  ???????????
*            ->  ???????????
*.TXT        ->  ????????TXT

Some things to note:

  • This assumes that all filenames are 8.3 names. The dot is implicit.
  • Names shorter than eight characters have implicit spaces to pad them.
  • These only allow one *, so any characters after a * and before a . are ignored.

This isn’t what Perl does on Windows, though. It’s only a bit of fun programming, maybe worthy of an exercise in the book.

Leave a comment

2 Comments.

  1. Is it my predisposition to BSD style globs or is the third case to magically match the B somewhere in that glob? I’d have expected it to match a B (and importantly only a B) just before the dot but $dos_pattern doesn’t show that (or rather it could match A*C as well)?

Leave a Reply

Your email address will not be published. Required fields are marked *

Trackbacks and Pingbacks: