There’s a better (correct) way to case fold

We show you the wrong way to do a case insensitive sort in Learning Perl, 6th Edition showed many of Perl’s Unicode features, which we had mostly ignored in all of the previous editions (despite Unicode support starting in Perl v5.6). In our defense, it wasn’t an easy thing to do without CPAN modules before the upcoming Perl v5.16.

In the “Strings and Sorting” chapter, we show this subroutine:

sub case_insensitive { "\L$a" cmp "\L$b" }

In the Unicode world, that doesn’t work (which I explain in Fold cases properly at The Effective Perler). With Perl v5.16, we should use the new fc built-in which does case folding according to Unicode’s rules:

use v5.16; # when it's released
sub case_insensitive { fc($a) cmp fc($b) }

We could use the double-quote case shifter \F to do the same thing:

use v5.16; # when it's released
sub case_insensitive { "\F$a" cmp "\F$b" }

Without Perl v5.16, we could use the Unicode::CaseFold module which defines an fc function.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

7ads6x98y