There’s a better (correct) way to case fold

We show you the wrong way to do a case insensitive sort in Learning Perl, 6th Edition showed many of Perl’s Unicode features, which we had mostly ignored in all of the previous editions (despite Unicode support starting in Perl v5.6). In our defense, it wasn’t an easy thing to do without CPAN modules before the upcoming Perl v5.16.

In the “Strings and Sorting” chapter, we show this subroutine:

sub case_insensitive { "\L$a" cmp "\L$b" }

In the Unicode world, that doesn’t work (which I explain in Fold cases properly at The Effective Perler). With Perl v5.16, we should use the new fc built-in which does case folding according to Unicode’s rules:

use v5.16; # when it's released
sub case_insensitive { fc($a) cmp fc($b) }

We could use the double-quote case shifter \F to do the same thing:

use v5.16; # when it's released
sub case_insensitive { "\F$a" cmp "\F$b" }

Without Perl v5.16, we could use the Unicode::CaseFold module which defines an fc function.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to Google Buzz Send Gmail Post to LinkedIn Post to Reddit Post to Slashdot Post to StumbleUpon Post to Technorati

Leave a Reply

All comments are moderated. See our comment policy.

Your email address will not be published. Required fields are marked *

*

Mark up Perl code with <pre class="brush:perl"></pre>. You do not need to escape HTML inside <pre>.

You can also use <a href="" title=""> <b> <blockquote cite=""> <cite> <code> <em> <i> <pre class=""> <strong>