Use a temporary file instead of clobbering data

How do you replace the the contents of a file? This is something I’ve been thinking quite a bit about because we often get away with bad practices, and it’s completely within the scope of Learning Perl to know how to do this. It might make a good edition to the next edition of the book.

Someone on Stackoverflow asked if it was okay to read from a file then write back to the same file. In this case, completely read the file and once done, write to the same filename. Most of the answers deal with the mechanics and miss the wisdom of that design.
That’s a problem with tutorial books too: there are only so many pages.

At the scope of Learning Perl, we are mostly showing you the syntax and sometimes giving you small glimpses into programming practice. The explicit goal of the book is to introduce the 80% of Perl that most people use most of the time for single file programs. We defer modules and multi-file programs to Intermediate Perl. Still, it’s mostly mechanics and very little architecture.

I’ve done this task the “wrong” way most of my career because we normally get away with it. But, the frequency of the bad event doesn’t matter. The severity of the consequence, even from an improbable event, that should motivate us.

There are two major concerns:

  • Was I able to completely write the new data?
  • What happens when other consumers read the file after I’ve started writing new data but haven’t finished?
  • What happens to the original data if there was a problem?

So, here’s an example. Suppose I have this file with several entries for each character, but I want to reduce that to one line per character with the sum of all that character’s numbers:

Fred 5
Fred 2
Barney 0
Wilma 2
Pebbles 37
Fred 9
Wilma 7
Barney 4

Since I need to add all of the numbers, I need to go through the entire file before I can write any output. That part is easy:

use v5.24;
use warnings;

use Data::Dumper;

my $file = 'characters.txt';
open my $fh, '<:encoding(UTF-8)', $file
	or die "Could not open <$file>: $!\n";

my %totals;
while( <$fh> ) {
	my( $name, $number ) = split;
	$totals{$name} += $number;
	}

foreach my $name ( keys %totals ) {
	say join ' ', $name, $totals{$name};
	}

The output (to standard output) is what I want:

Barney 4
Pebbles 37
Fred 16
Wilma 9

Now the trick is to get this back into the original file. By the point I can output the data, I’ve completely read the original data. If I open the same file for writing to replace the data, though, there’s a small interval from the time I open it to the time the data is in it. If something happens in that short interval, I lose the original data and potentially can’t output the new data.

This actually happened with one customer (although not due to anything I did). There’s an insistence with some Perl users that every program must have warnings turned on. However, a warning free program with one version of perl might emit warnings in the next version. If I’m not paying attention to warnings, what good are they? Well, they are good for filling up log files and taking up all the space on disk. Now try writing to a file on an already-full disk. If I were writing to the same filename that I just read, I now have none of the data.

Instead, I can write to a separate file. Once I know that I’ve completely written the new data, I can replace the original. You might like to read Is rename() atomic?.

One way is to use a temporary file. I completely write the data before I move it into place:

...

use File::Temp qw(tempfile);
my( $tempfh, $tempfile ) = tempfile();
foreach my $name ( keys %totals ) {
	say { $tempfh } join ' ', $name, $totals{$name};
	}
close $tempfh;

rename $tempfile => $file;

That foreach loop is simple but imagine something more meaty where the code might die. Some modules “helpfully” die to denote an error so might stop your program before it has gone through the entire hash.

There’s a small gotcha here, though. The new file is a different inode (or whatever your filesystem does) than the original. If there are hardlinks to the original file, that data do not disappear and all those hard links see the old data. If that matters to you, use File::Copy instead. This has the advantage of working across partitions too:

...

use File::Copy qw(copy);
use File::Temp qw(tempfile);
my( $tempfh, $tempfile ) = tempfile();
foreach my $name ( keys %totals ) {
	say { $tempfh } join ' ', $name, $totals{$name};
	}
close $tempfh;

copy $tempfile => $file;

In either of these, you may need to use chown or chmod to ensure the new file has the right owner, group, and permissions.

This doesn’t keep an asteroid from hitting your computer and ruining the whole thing, but it’s much safer in the usual case. It’s not that much more complex either, and still within the beginner Perl we show in the book.

You don’t have to use a temporary file (and rename has to work with two files on the same partition). You could write to some other file that you chose then move that into place. The trick is that there is some moment in time where both the old data and the new data co-exist on disk.

You can also move the original file first then write to the original filename, like perl‘s -i command-line switch does since v5.28. It even took perl a long time to figure this out.

Perhaps a more advanced trick, though still within the scope of Learning Perl, is to use hard or symbolic links. I typically use this trick when I want to keep several versions around because I may want to look at any one of them. I can then make a link from a virtual version, such as current.txt or lastest.csv, to the actual file that’s the most recent (or the default or whatever). With this, the new file is also a new name so it doesn’t disturb the original. Once written completely, I update the link. Remember, hard links have to be on the same partition, but symlinks don’t.