Learning Perl Challenge: tripwire

The previous Learning Perl Challenge asked you to find duplicate files. This challenge needs some of which you did there, but for a different purpose.

Write a program that monitors a directory to find any file changes. Programs such as tripwire do this by recording meta information about a file on its first run then checking that the same information matches later. For instance, the size and SHA1 digest should stay the same. You could also just store the original content, but that’s not very convenient.

Since you’re at the Learning Perl level, we can’t ask too much here or judge you too harshly. A lot of the problem is storing the data and reading it later. Here’s a hint: create a flat file to store the “good” data on the first run, then read this file on the second run:

#name:size:SHA1
file.txt:1023:53a0935982ae11a4784d51aa696733c947c0614f

How are you going to handle the security on this file after you create it? As an example, you might look at CPAN::Checksums, which handles the same task for the modules on CPAN.

There are many ways that you can employ use this. You can run it periodically from cron, for instance, but you might also make a daemon that runs continually and is always checking. Once you find a change, you can report it in many ways, but we’ll only ask you to print a line to the terminal, that might look something like:

file.txt changed.

Was:
Size: 1023 bytes
SHA1: 53a0935982ae11a4784d51aa696733c947c0614f

Is now:
Size: 2001 bytes
SHA1: 730c6983bb9f942ef5cf6c174d76ad0c1594c1a7

You can see a list of all Challenges and my summaries as well as the programs that I created and put in the Learning Perl Challenges GitHub repository.

10 thoughts on “Learning Perl Challenge: tripwire”

  1. Rather than write a script that would be run as a cron job, I decided to expand my horizons a bit and write my first daemon. I used Any::Daemon and added a few command line options, including a check frequency (ie the number of seconds to sleep between checks), and max_iterations (the number of check iterations to run before exiting–a value of 0 means that the daemon won’t exit on its own).

  2. I’ve come up with a solution which can be run periodocally from cron. I’ve implemented a daemon in ruby, but I’m not sure yet how to do it properly in Perl.

    My little script requires exactly one directory to monitor and takes an optional location for the flat file to store the “good” data. It will in any case check if it can actually create the file. The required directory is traversed recursively and the sizes and digests of all plain files found are stored next to the names. If it is the first run the “good” data flat file is created. If it’s not the first run the new sizes and digests are compared, output is generated accordingly and a new “good” data file is saved.

    use 5.014;
    use Cwd qw(abs_path getcwd);
    use Digest;
    use File::Find;
    use File::Spec;
    use Getopt::Long;
    use strict;
    use utf8;
    use warnings;
    
    my $file;
    my %unchecked;
    my %checked;
    my $algorithm = 'SHA-256';
    GetOptions ('file=s' => \$file);
    die 'Usage: tripwire [--file=filename] directory' if @ARGV != 1;
    my $directory = abs_path $ARGV[0];
    die "$directory is not a directory" if not -d $directory;
    if (not defined $file) {
      $file = File::Spec->catfile(getcwd, $directory =~ s/\//_/gr);
    }
    open FILE, '+>>', $file or die "Can't open $file: $!";
    close FILE;
    find \&wanted, $directory;
    open FILE, $file or die "Can't open $file: $!";
    while () {
      chomp;
      my ($name, $oldsize, $olddigest) = split ':', $_;
      if (exists $unchecked{$name}) {
        if ("$oldsize:$olddigest" ne $unchecked{$name}) {
          my ($newsize, $newdigest) = split ':', $unchecked{$name};
          printf_change($name, 'changed');
          printf_detail('Was', $oldsize, $algorithm, $olddigest);
          printf_detail('Is now', $newsize, $algorithm, $newdigest);
        }
        $checked{$name} = $unchecked{$name};
        delete $unchecked{$name};
      } else {
        printf_change($name, 'deleted');
        printf_detail('Was', $oldsize, $algorithm, $olddigest);
      }
    }
    foreach my $name (keys %unchecked) {
      my ($size, $digest) = split ':', $unchecked{$name};
      printf_change($name, 'added');
      printf_detail('Is now', $size, $algorithm, $digest);
      $checked{$name} = $unchecked{$name};
      delete $unchecked{$name};
    }
    open FILE, '>', $file or die "Can't open $file: $!";
    while (my ($key, $value) = each %checked) {
      say FILE "$key:$value";
    }
    close FILE;
    
    sub printf_change {
      printf "%s %s.\n\n", @_;
    }
    
    sub printf_detail {
      printf "%s:\nSize: %i bytes\n%s: %s\n\n", @_;
    }
    
    sub wanted {
      if (-f and $File::Find::name ne $file) {
        $unchecked{$File::Find::name} = -s _;
        $unchecked{$File::Find::name} .= ':';
        if (-r _) {
          my $digest = Digest->new($algorithm);
          $digest->addfile($File::Find::name);
          $unchecked{$File::Find::name} .= $digest->hexdigest;
        }
      }
    }
    

    My next step would be to add a daemonize option and maybe use SQLite instead of a flat file for storing the “good” data.

    1. Thanks for the suggestions, Dan. I’ve updtaed the function using a regular expression to determine the file extension and in_array() instead of looping. I also added an argument to make the comparison case-sensitive or not. Otherwise, the function works exactly the way it used to.

  3. I wrote a Perl example here – https://plus.google.com/u/0/102874059713383300948/posts . (g+ doesn’t do code formatting…)

    #!/usr/bin/perl
    
    use Modern::Perl;
    use File::Find;
    use Linux::Inotify2;
    use Getopt::Std;
    use Digest::MD5 'md5_hex';
    use autodie;
    $| = 1;
    
    my %files;
    our $opt_d;
    
    getopt('d');
    
    ( -d $opt_d ) or do {
      say "$0 -d /path/to/directory";
      exit;
    };
    
    find( \&wanted, $opt_d );
    
    my $inotify = new Linux::Inotify2
      or die "Unable to create inotify object: $!\n";
    
    $inotify->watch(
      $opt_d,
      IN_MODIFY | IN_ONESHOT,
      sub {
         my $e = shift;
         my $full = $e->fullname;
         $full =~ s#//#/#;
         if ( $e->IN_MODIFY ) {
           say ">$full< was modified!";
           say "\tsize was ", $files{$full}{size};
           say "\tnew size is ", -s $full;
           say "\tmd5 was ", $files{$full}{md5};
           say "\tnew md5 is ", md5($full);
           $files{$full}{size} = -s $full;
           $files{$full}{md5} = md5($full);
         }
      } );
    
    1 while $inotify->poll;
    
    sub wanted {
      if ( -f $File::Find::name ) {
        my $seen = $File::Find::name;
        $files{$seen} = {
          size => -s $seen,
          md5 => md5($seen),
        };
      }
    }
    
    sub md5 {
      my $sum = shift;
      my $ret = do {
        local $/ = undef;
        open( my $FILE, '<', $sum );
        <$FILE>;
      };
      return md5_hex($ret);
    }
    
    1. I like the inotify approach, but it just works with Linux. Is there actually something similar that works with FreeBSD or Mac OS X?

  4. Below is my version of tripwire. I didn’t deal with creating or deleting a file.
    BTW, what’s the intention of guiding us to read CPAN::Checksums?

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use Digest::SHA;
    
    die "require one parameter\n" if @ARGV != 1;
    my $dir = shift @ARGV;
    die "require a directory\n" unless -d $dir;
    chdir $dir or die "chdir error: $!";
    
    opendir DH, $dir or die "opendir error: $!";
    my @files = readdir DH;
    
    my $metafile = ".meta";
    if (-e $metafile) {
        open META, "< $metafile" or die "open error: $!";
        my %sz;
        my %hash;
        while() {
            chomp;
            my ($name, $size, $sha1) = split /:/, $_;
            $sz{$name} = $size;
            $hash{$name} = $sha1;
        }
        foreach (@files) {
            next if /^\./;
            my $sha1 = &get_sha1($_);
            if($hash{$_} ne $sha1) {
                my $size = (-s);
                print "$_ changed.\n\n";
                print "Was:\nSize: $sz{$_} bytes\nSHA1: $hash{$_}\n\n";
                print "Is now:\nSize: $size bytes\nSHA1: $sha1\n\n";
                $hash{$_} = $sha1;
                $sz{$_} = $size;
            }
        }
        open META, "< $metafile" or die "open error: $!";
        foreach (keys %hash) {
            print META "$_:$sz{$_}:$hash{$_}\n";
        }
    }
    else {#create metafile first...
        open META, "> $metafile" or die "open error: $!";
        foreach (@files) {
            next if /^\./;
            my $size = (-s);
            my $sha1 = &get_sha1($_);
            print META "$_:$size:$sha1\n";
        }
    }
    
    sub get_sha1 {
        my $name = shift;
        open FILE, $name or die "open error: $!"; 
        binmode FILE;
        my $sha1 = Digest::SHA->new(1)->addfile(*FILE)->hexdigest;
    }
    
  5. Rather than write a script that would be run as a cron job, I decided to expand my horizons a bit and write my first daemon. I used Any::Daemon and added a few command line options, including a check frequency (ie the number of seconds to sleep between checks), and max_iterations (the number of check iterations to run before exiting–a value of 0 means that the daemon won’t exit on its own).

    You can also find the code here.

    use strict;
    use warnings;
    use Any::Daemon;
    use Digest::SHA1;
    use Getopt::Long;
    use Array::Utils qw(array_minus intersect);
    
    my $dir=".";
    my $check_freq=10; #number of seconds between checks.
    my $username=$ENV{LOGNAME};
    my $max_iterations=0; #max number of iterations, 0 means there is no maximum.
    my $background=0; #flag to determine whether or not the daemon runs in the background.
    
    GetOptions('d|dir=s'=>\$dir,
    	'c|check_freq=i'=>\$check_freq,
    	'u|username=s'=>\$username,
    	'm|max_iterations=i'=>\$max_iterations,
    	'b|background'=>\$background);
    
    die "Check frequency must be positive" unless $check_freq>0;
    
    die "No directory given" unless $dir;
    
    die "The maximum number of iterations must be nonnegative" unless $max_iterations>=0;
    
    
    my $daemon=Any::Daemon->new(user=>$username,workdir=>$dir);
    $daemon->run(
    	background=>$background,
    	child_task=>\&check_dir,
    	max_childs=>1,
    	kill_childs=>sub{exit;}
    	);
    
    sub check_dir
    {                       
    	opendir(my $dh,$dir) or die $!;
    	my @files=sort grep{(-f $_) and (-r $_)}readdir($dh);
    	closedir($dh);
    	
    	unless(@files)
    	{
    		print "No readable files found in directory $dir.\n";
    		exit 0;
    	}
    	
    	my %data=();
    	
    	#file_data returns a hash ref of the form
    	#{sha1=>$sha_digest,size=>$file_size}
    	$data{$_}=file_data($_) foreach(@files);
    	
    	my $counter=1;
    	
    	while(1)
    	{
    		sleep $check_freq;
    		                                    
    		opendir(my $dh,$dir) or die $!;
    		my @updated_files=sort grep{(-f $_) and (-r $_)}readdir($dh);
    		closedir($dh);
    		
    		my @deleted_files=array_minus(@files,@updated_files);
    		
    		foreach my $file(@deleted_files)
    		{
    			print "$file has been deleted or moved from directory $dir.\n\n";
    			
    			print "Was:\n";
    			print "Size: " . $data{$file}->{size} . " bytes\n";
    			print "SHA1: " . $data{$file}->{sha1} . "\n\n";
    		}
    		
    		my %updated_data=();
    		$updated_data{$_}=file_data($_) foreach (@updated_files);
    		
    		my @new_files=array_minus(@updated_files,@files);
    		
    		foreach my $file(@new_files)
    		{
    			print "New file: $file\n\n";
    			print "Is now:\nSize: " . $updated_data{$file}->{size} . " bytes\n";
    			print "SHA1: " . $updated_data{$file}->{sha1} . "\n\n";
    		}
    		
    		my @intersection=intersect(@updated_files,@files);
    		
    		foreach my $file(@intersection)
    		{
    			unless(
    					($data{$file}->{size} == $updated_data{$file}->{size})
    					and
    					($data{$file}->{sha1} eq $updated_data{$file}->{sha1})
    				)
    			{
    				print "$file changed.\n\n";
    				print "Was:\n";
    				print "Size: " . $data{$file}->{size} . " bytes\n";
    				print "SHA1: " . $data{$file}->{sha1} . "\n\n";
    				
    				print "Is now:\n";
    				print "Size: " . $updated_data{$file}->{size} . " bytes\n";
    				print "SHA1: " . $updated_data{$file}->{sha1} . "\n\n";
    			}
    		}
    		
    		#updating old files and old data with new stuff
    		@files=@updated_files;
    		%data=%updated_data;
    		
    		if($max_iterations)
    		{
    			print "Iteration $counter complete.\n";
    			if($counter==$max_iterations)
    			{
    				kill 'INT',getppid or die "Cannot signal " . getppid . " with SIGINT: $!";
    				exit 0;
    			}
    			$counter++;
    		}
    	}
    	
    	
    }
    
    sub file_data
    {
    	my $file=shift;
    	die "No argument given to file_data" unless defined $file;
    	
    	open(my $fh,"new;
    	$sha1->addfile($fh);
    	my $digest=$sha1->hexdigest;
    	close($fh);
    	my $size = -s $file;
    	return {sha1=>$digest,size=>$size};
    	
    }
    
  6. #!/usr/bin/env perl
    
    use Modern::Perl;
    use Digest::MD5;
    use File::Find;
    use Getopt::Std;
    
    use constant DATAFILE => ".checksums";
    
    sub get_checksum
    {
        my $filename = shift;
        open my $fh, "<", $filename or return;
        local $/;
        return Digest::MD5::md5_hex();
    }
    
    sub get_checksums
    {
        my $info = {};
        find({
                wanted => sub {
                    return if -d $_ || ! -r $_;
                    return unless defined (my $checksum = get_checksum($_));
                    my $size = -s $_;
                    $info->{$_} =  "$size:$checksum";
                },
                no_chdir => 1,
            },
            @_);
        return $info;
    }
    
    sub store_checksums
    {
        my $datafile = shift;
        my $info = shift;
        open my $fh, ">", $datafile;
        foreach my $filename (keys %$info)
        {
            say $fh "$filename:$info->{$filename}"
        }
    }
    
    sub read_checksums
    {
        my $datafile = shift;
        my $info = {};
    
        open my $fh, "<", $datafile;
        while ()
        {
            chomp;
            my ($filename, $data) = split ':', $_, 2;
            $info->{$filename} = $data;
        }
    
        return $info;
    }
    
    sub verify_checksums
    {
        my $old = read_checksums shift;
        my $new = shift;
    
        foreach my $filename (keys { map { $_ => 1 } (keys %$new, keys %$old) })
        {
            if (exists $new->{$filename} && exists $old->{$filename})
            {
                my ($odata, $ndata) = ($old->{$filename}, $new->{$filename});
                if ($odata ne $ndata)
                {
                    say "$filename changed from $odata to $ndata";
                }
            }
            elsif (exists $new->{$filename})
            {
                say "$filename is new";
            }
            else
            {
                say "$filename dissapeared";
            }
        }
    }
    
    my %opts = ( f => DATAFILE );
    
    getopts('rf:', \%opts);
     
    push @ARGV, '.' unless scalar @ARGV;
     
    my $checksums = get_checksums @ARGV;
    
    if (exists $opts{'r'} || ! -e $opts{'f'})
    {
        store_checksums $opts{'f'}, $checksums;
    }
    else
    {
        verify_checksums $opts{'f'}, $checksums;
    }
    
  7. #!/usr/bin/env perl
    
    # usage: dirmon [DIRECTORY] - directory is defaulting to .
    
    use common::sense;
    use Digest::MD5 qw(md5_hex);
    
    my $dir=shift//'.';
    die "Cannot access $dir: $!\n" unless (-x $dir && -d $dir);
    chdir $dir;
    my @files=glob'*';
    @files=grep(!-d $_, @files);
    
    unless (-e ".filehashes") {
      warn "monitoring file .filehashes doesn't exist yet; starting to monitor this directory.\n";
      createfile();
      exit 0;
    }
    
    my (%old_status,%new_status);
    
    open HASHFILE, "<.filehashes" or die "Cannot open .filehashes: $!";
    while (my $statusline=<>) {
      chomp $statusline;
      next if $statusline=~'^#'; #skip comments
      my ($file, $size, $digest)=split ':', $statusline;
      $old_status{$file}=[$digest, $size];
    }
    close HASHFILE or die "Cannot close .filehashes: $!\n";
    
    foreach my $file(@files) {
      open FILE, "<$file" or die "Cannot open $file: $!\n";
      $new_status{$file}=[md5_hex(), (stat($file))[7]];
      close FILE or die "Cannot close $file: $!\n";
    }
    
    foreach my $file (keys %old_status) {
      next if defined $new_status{$file};
      say "ALERT! $file is missing."
    }                                                                                                                   
                                                                                                                        
    foreach my $file (keys %new_status) {                                                                               
      next if defined $old_status{$file};                                                                               
      say "ALERT! $file is new."                                                                                        
    }                                                                                                                   
                                                                                                                        
    foreach my $file (keys %new_status) {                                                                               
      next if $old_status{$file}[0] eq $new_status{$file}[0];                                                           
      say "$file has changed.";                                                                                         
      say "    Last check: Size: ".$old_status{$file}[1];                                                               
      say "                MD5:  ".$old_status{$file}[0];                                                               
      say "    Actual:     Size: ".$new_status{$file}[1];                                                               
      say "                MD5:  ".$new_status{$file}[0];                                                               
    }                                                                                                                   
                                                                                                                        
    exit 0;                                                                                                             
                                                                                                                        
    sub createfile {                                                                                                    
      open HASHFILE, "<.filehashes" or die "Cannot open .filehashes: $!\n";                                             
      print HASHFILE "#name:size:MD5\n";                                                                                
      foreach my $file (@files) {                                                                                       
        my $size=(stat($file))[7]; 
        open FILE, "<$file" or die "Cannot open $file\n";
        my $digest=md5_hex();
        close FILE or die "Cannot close $file\n";
        print HASHFILE $file.':'.$size.':'.$digest."\n";
      }
      close HASHFILE or die "Cannot close .filehashes: $!\n";
    }
    

Comments are closed.