unichar, a small Unicode character test program

Re-writing Learning Perl to cover Unicode means I have to figure out how to type some of the characters that don’t show up on my keyboard. Not only that, I need to figure out their character names and code points for the examples. I want to convert from any of those (name, code point, character) to a description of the character. I want something like this:

$ perl unichar ã
Processing ã
		match       grapheme
		code point  U+00E3
		decimal     227
		name        LATIN SMALL LETTER A WITH TILDE
		character   ã

I wrote a short program I called unichar, which I have on github.

There are some interesting parts of the script (which might change since I’m still tinkering with it). Even though my locale is set to en_US.UTF-8 and the command-line arguments are UTF-8, the script still doesn’t see them that way so I have to decode them as UTF-8. The decode subroutine from Encode takes whatever I have and turns it into a UTF-8 string. In this case, I do that for all the elements of @ARGV:

use Encode qw(decode);
use I18N::Langinfo qw(langinfo CODESET); 

my $codeset = langinfo(CODESET);
@ARGV = map { decode $codeset, $_ } @ARGV;

There are some other interesting bits in there too, but they are a bit advanced for Learning Perl.

Here are some more examples of the output. I handle unprintable and invisible characters specially:

$ perl unichar 䣱
Processing 䣱
		match       grapheme
		code point  U+48F1
		decimal     18673
		name        
		character   䣱

$ perl unichar ↞
Processing ↞
		match       grapheme
		code point  U+219E
		decimal     8606
		name        LEFTWARDS TWO HEADED ARROW
		character   ↞

$ perl unichar U+2057
Processing U+2057
		match       code point
		code point  U+2057
		decimal     8279
		name        QUADRUPLE PRIME
		character   ⁗

$ perl unichar "TAMIL LETTER HA"
Processing TAMIL LETTER HA
		match       name
		code point  U+0BB9
		decimal     3001
		name        TAMIL LETTER HA
		character   ஹ

$ perl unichar 0x05d0
Processing 0x05d0
		match       code point
		code point  U+05D0
		decimal     1488
		name        HEBREW LETTER ALEF
		character   א

$ perl unichar "CYRILLIC CAPITAL LETTER I WITH GRAVE"
Processing CYRILLIC CAPITAL LETTER I WITH GRAVE
		match       name
		code point  U+040D
		decimal     1037
		name        CYRILLIC CAPITAL LETTER I WITH GRAVE
		character   Ѝ

$ perl unichar 0x9
Processing 0x9
		match       code point
		code point  U+0009
		decimal     9
		name        CHARACTER TABULATION
		character   

$ perl unichar 0x07
Processing 0x07
		match       code point
		code point  U+0007
		decimal     7
		name        BELL
		character