Find the top 10 directories on your system that has the most files in it. For that, only count the files immediately under it. That is, don’t count files in subdirectories. For file, I mean just about anything that isn’t a symbolic link or weird device. Think about how you want to show the person running your program what it’s doing.
This challenge isn’t about counting so much as traversing, remembering, and displaying. How do you know what you need to handle next and which one was largest?
I’m actually writing my own version of this right now because I’m making some benchmarks on opendir versus glob and need some test cases. I could just create some new directories and make a bunch of fake files in them, but that’s no fun.
I don’t care how long your program takes, although you might. Let it run in a window (or screen) on its own. Test it on a small directory first (so, there’s a hint there).
I made a Curses version (but don’t look at it until you’ve tried your own solution!):
You can see a list of all Challenges and my summaries as well as the programs that I created and put in the Learning Perl Challenges GitHub repository.
use strict; use warnings; use Path::Iterator::Rule; $|++; # start in this dir my $BASEDIR = $ARGV[0] || $ENV{HOME}; # how many top results my $TOP_N = $ARGV[1] || 10; # give intermediate results to make it less boring for the user my $INTERMEDIATE_RESULTS_EVERY = $ARGV[2] || 10000; # maps dirname to number of files in it my $dir_to_file_number = {}; # shortcut to number of keys in dir_to_file_number my $dirs_visited = 0; # helper functino that prints the top-n sub print_top_n { my $cur_result = 0; my @keys_sorted = sort { $dir_to_file_number->{$b} $dir_to_file_number->{$a} } keys %$dir_to_file_number; for (@keys_sorted) { last if $cur_result++ >= $TOP_N; printf "\n%-8d: %s", $dir_to_file_number->{$_}, $_; } print "\n"; } # rules for traversing dirs my $dir_rule = Path::Iterator::Rule->new ->dir ->min_depth(1) ; # rules for finding files in a dir my $file_rule = Path::Iterator::Rule->new ->file ->min_depth(1) ->max_depth(1) ; # dir iterator my $next_dir = $dir_rule->iter($BASEDIR, {follow_symlinks=>0, depthfirst=>0, loop_safe=>0} ); while (my $cur_dir = $next_dir->()) { # print current dir printf "\r%-70s", substr($cur_dir,0, 70); # file iterator my $next_file = $file_rule->iter( $cur_dir, {follow_symlinks=>0} ); # count the files my $i = 0; $i++ while (my $cur_file = $next_file->()); $dir_to_file_number->{$cur_dir} = $i; # print intermediate results if wanted if (++$dirs_visited % $INTERMEDIATE_RESULTS_EVERY == 0) { print_top_n; $dirs_visited = 0; } } # print final results print_top_n;I had a script that does just that in my ~/bin, originally using File::Find, now rewritten using Path::Iterator::Rule, after reading rjbs’ file finder modules comparison.
Might as well try doing this challenge (interesting enough).
use strict; use warnings; use File::Find::Rule; sub count_files { my $directory = shift; # This works (and it's less ugly), but it's 10 seconds slower. # File::Find::Rule->new->file->maxdepth(1)->in($directory); my $dh; opendir $dh, $directory or return; grep { -f "$directory/$_" } readdir $dh; } my @top; for (File::Find::Rule->new->directory->in("/")) { my $count = count_files($_) || 0; # Purely optimization, appears to save 0.5 seconds on my PC next if $count <= ($top[9] || [0])->[0]; push @top, [$count, $_]; @top = reverse sort {$a->[0] <=> $b->[0]} @top; # Only top 10 elements are fine splice @top, 10; } for (@top) { printf "%6u %s\n", @$_; }Runs in 4 seconds on my PC. Probably could be optimized further, but I don’t care much. Also, I hope that using CPAN is fine.
A first attempt…
#! /usr/bin/env perl # usage: ttdir [directory] use common::sense; use File::Find; my @path=shift || '/home/qje96/learning-perl'; my $top=10; my %dircounter; my @counted_dirs; find(\&count, @path); sub count { $dircounter{$File::Find::dir}++ if (-f && !-l); } foreach my $dir (keys %dircounter) { push @counted_dirs, [$dircounter{$dir}, $dir]; } @counted_dirs=sort {$b->[0] <=> $a->[0]} @counted_dirs; for (my $rank=0; $rank<$top; $rank++) { printf "%5d files in %s\n", @counted_dirs[$rank]->[0], @counted_dirs[$rank]->[1]; } exit 0;Bugfix: in small trees there may be less than 10 directories.
#! /usr/bin/env perl # usage: ttdir [directory] use common::sense; use File::Find; my @path=shift || '/home/qje96/perl/learning-perl'; my $top=10; my %dircounter; my @counted_dirs; my $dirnumber; find(\&count, @path); sub count { $dircounter{$File::Find::dir}++ if (-f && !-l); } foreach my $dir (keys %dircounter) { push @counted_dirs, [$dircounter{$dir}, $dir]; $dirnumber++; } @counted_dirs=sort {$b->[0] <=> $a->[0]} @counted_dirs; for (my $rank=0; ($rank<$top and $rank<$dirnumber); $rank++) { printf "%5d files in %s\n", @counted_dirs[$rank]->[0], @counted_dirs[$rank]->[1]; } exit 0;Added some comments, tried to use a bit better style.
#!/usr/bin/env perl # usage: ttdir [directory] use common::sense; use File::Find; my @path=shift || '.'; my $top=10; my %dircounter; my @counted_dirs; # array of arrays find(\&count, @path); # let File::Find do all the work of traversing foreach my $dir (keys %dircounter) { push @counted_dirs, [$dircounter{$dir}, $dir]; # add an array } @counted_dirs=sort {$b->[0] <=> $a->[0]} @counted_dirs; # sort descending by number of files foreach my $dir (@counted_dirs[0..$top-1]) { # slice of top-n elements last unless (defined $dir); printf "%5d files in %s\n", @$dir; } exit 0; sub count { $dircounter{$File::Find::dir}++ if (-f && !-l); # $_ contains filename in actual directory }It was fun :). Here my solution:
#!/usr/bin/env perl use warnings; use strict; my $dir = $ARGV[0] || '.'; my $N = 10; # Number top directories to show my %dirs; opendir(DIR, $dir) or die $!; while (my $file = readdir(DIR)) { next if (-f $file or $file =~ m/^\./); my $path = "$dir/$file"; # path => total files (yes hidden and no symbolic link) $dirs{$path} = scalar( grep { -f and ! -l } glob("$path/* $path/.*")); } closedir(DIR); # List $N top directories sorted for ( ( sort { $dirs{$b} <=> $dirs{$a} } keys %dirs )[0..$N-1] ) { exit 0 if (! $_ ); printf("%6s %s\n", $dirs{$_}, $_) }I’ve posted similar challenges as brainteasers, at work, as well as using them to sort job applications into the categories: “knows Perl” and “knows the word, ‘Perl'”. I’ve found that my beautiful, well factored, beautifully documented code can be boiled down to a simple Unix pipeline, so in this case I began with that:
sudobecause we’ll have to go into all sorts of directories, not all of which are owned by the user, even on my personal desktop.find / -type f -print– Traverse all directories beginning at the root directory, and print out the path to each file.2 > /dev/null– If there are weird errors because of strange names, just discard the error messages. Might not be a suitable solution if your software is running a nuclear power plant or a Mars Rover, but a great first approximation.xargs -n 1 dirname– take each line of output from ‘find’, and consider only the path; discard the filename component.sort– Get all identical values adjacent.uniq -c– replace a sequence of identical lines with a single instance, preceded by the number of times it was seen. Non-adjacent instances are not collapsed, which is why the sorting is necessary beforehand.sort -n -r– Sort the output of ‘uniq’ by the numeric count (-n), in descending order (-r).head– brian wants the first ten.Makes for a pretty good start, but it generates error messages about mis-matched quote characters. Using tr to clean out all the expected characters in filenames, to find the odd chars, I discover filenames containing ‘, `, ~, ^, %, #, +, {, }, [, ], , |plus some files with totally Chinese names. Re-reading the ‘find’ man page, I rediscover the “-print0” …. rediscover in the sense that I’ve read about it before, but never used it. Man page says,
-X Permit find to be safely used in conjunction with xargs(1). If a file name contains any of the delimiting characters used by xargs(1), a diag- nostic message is displayed on standard error, and the file is skipped. The delimiting characters include single (`` ' '') and double (`` " '') quotes, backslash (``\''), space, tab and newline characters. However, you may wish to consider the -print0 primary in conjunction with ``xargs -0'' as an effective alternative.-print0uses null-terminated strings and-0tellsxargsto expect that intput. Changing to:produces much better results:
and similar numbers for other directories.
Hmm … time to delete that directory, haven’t looked in there in nine years.