Copy instead of renaming to preserve hard links

Yesterday I posted about not overwriting a file until you knew you had completely and correctly stored its data. Part of that was about data security and another part was completeness for other consumers. I used a temporary file and rename to move the new file into place. That’s not the entire story though.

Some of this you can glean from reading your filesystem code. For instance, the link(2) manual page explains quite a bit. I’m going to back up to same basics for this post. Perl, being part of the Unix toolbox, is close to the underlying system calls. Understanding what those things do helps you use Perl appropriately.

Linking to data

Create your original data file. The filesystem assigns it some sort of identifier (“inode”, “fileId”, whatever). I’ll use the word “inode” throughout this, but your filesystem might call it something different. This is not the filename. We’ll get to that in a moment. You get this number in the “inode” part of stat result:

A filename “links” to the inode; this is why there are link and unlink commands in Perl:

You can link several names to the same data. No matter which file name you use you’re playing with the same data. The names are just labels:

The data at 215132 stick around as long as there are links to it (and maybe a bit longer) or it has an open filehandle. There are various reasons you might do this, but those would be distracting here. Read Use cases for hardlinks? for some ideas.

Here’s a little Perl program that demonstrates more than one name pointing to the same inode:

#!perl
use v5.14;
use warnings;

my $original    = 'rocks.txt';
my $second_name = 'geology.txt';
unlink $original, $second_name; # start fresh

my @names = qw(Fred Barney Wilma Betty);

open my $fh, '>:utf8', $original
	or die "Could not write <$original>: $!\n";
say {$fh} $_ foreach @names;
close $fh;

say "$original has inode " . (stat $original)[1];
say "File has " . (stat $original)[3] . " links";

link( $original, $second_name );

say "$second_name has inode " . (stat $second_name)[1];
say "File has " . (stat $second_name)[3] . " links";

unlink $original
	or die "Could not remove <$original>: $!\n";

say "=== after unlink";
say "$second_name has inode " . (stat $second_name)[1];
say "File has " . (stat $second_name)[3] . " links";

open my $in, '<:encoding(UTF-8)', $second_name
	or die "Could not read <$second_name>: $!\n";
print $_ foreach <$in>;
close $in;

The output shows that both files have the same inode and that the data are still there even after the original filename disappears:

rocks.txt has inode 8666857151
File has 1 links
rocks.txt has inode 8666857151
geology.txt has inode 8666857151
File has 2 links
=== after unlink
geology.txt has inode 8666857151
File has 1 links
Fred
Barney
Wilma
Betty

Renaming files

When you rename a file, you aren’t moving data around. You change the text in the filename but not the data on disk or where the filesystem put it. The rename merely changes the link, which is why rename only works on the same partition::

rename 'rocks.txt' => 'characters.txt';

This is where the problem comes in when I want to rename some other file, such as my temporary file, into the original filename. I create the temporary file, but that’s a new inode:

When I rename my tempfile to the original name, the new inode gets that name and the original inode loses that name:

Many times this won’t be a problem, but what if some other program or person had made a hard link to the original inode and expected the current data to be in that inode? The other links point to the original inode while the new data are in a different inode:

Here’s the Perl program that shows the flow of inodes and names:

use v5.14;
use warnings;

my $original    = 'rocks.txt';
my $second_name = 'geology.txt';
unlink $original, $second_name;

my @names = qw(Fred Barney Wilma Betty);

open my $fh, '>:utf8', $original
	or die "Could not write <$original>: $!\n";
say {$fh} $_ foreach @names;
close $fh;

say "$original has inode " . (stat $original)[1];
say "File has " . (stat $original)[3] . " links";

link $original, $second_name;
say "$second_name has inode " . (stat $second_name)[1];
say "File has " . (stat $second_name)[3] . " links";


use File::Temp qw(tempfile);
my( $tempfh, $tempfile ) = tempfile();
say {$tempfh} uc($_) foreach @names;
close $tempfh;

say "$tempfile has inode " . (stat $tempfile)[1];

rename $tempfile => $original;

say "=== After rename";
say "$original has inode " . (stat $original)[1];
say "$second_name has inode " . (stat $second_name)[1];
say "File has " . (stat $original)[3] . " links";

say "=== In $original";
open my $in, '<:encoding(UTF-8)', $original
	or die "Could not read <$second_name>: $!\n";
print $_ foreach <$in>;
close $in;

say "=== In $second_name";
open my $in2, '<:encoding(UTF-8)', $second_name
	or die "Could not read <$second_name>: $!\n";
print $_ foreach <$in2>;
close $in2;

The output shows the progression. Both rocks.txt and geology.txt start off with the same inode (so, the same data). The temporary file has a different inode and different data. After the rename, rocks.txt points to the temporary file’s inode while geology.txt points to the original inode. The new data are in rocks.txt but geology.txt still points to the original data. Anyone going through geology.txt doesn’t see the updates:

rocks.txt has inode 8666869539
File has 1 links
geology.txt has inode 8666869539
File has 2 links
/var/folders/jf/7sn23hrs11jcrn2w39wm6k_r0000gn/T/6vytLYAT2c has inode 8666869542
=== After rename
rocks.txt has inode 8666869542
geology.txt has inode 8666869539
File has 1 links
=== In rocks.txt
FRED
BARNEY
WILMA
BETTY
=== In geology.txt
Fred
Barney
Wilma
Betty

Copying files

Instead of renaming the file, you can copy the file. That moves the contents of one inode into another. If the destination already has an inode (the file exists), that inode is reused:

# replace the rename with these lines
use File::Copy qw(copy);
copy $tempfile => $original;

After the copy, rocks.txt links to the same inode, and both rocks.txt and geology.txt still point to the same data:

rocks.txt has inode 8666869995
File has 1 links
geology.txt has inode 8666869995
File has 2 links
/var/folders/jf/7sn23hrs11jcrn2w39wm6k_r0000gn/T/LATmehGV5z has inode 8666869998
=== After rename
rocks.txt has inode 8666869995
geology.txt has inode 8666869995
File has 2 links
=== In rocks.txt
FRED
BARNEY
WILMA
BETTY
=== In geology.txt
FRED
BARNEY
WILMA
BETTY