Changeset 700

Show
Ignore:
Timestamp:
08/05/02 23:25:36
Author:
miyagawa
Message:

update README

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • HTML-Entities-ImodePictogram/trunk/README

    r221 r700  
    2929    encode_pictogram 
    3030          $html = encode_pictogram($rawtext); 
     31          $html = encode_pictogram($rawtext, unicode => 1); 
    3132 
    32         Encodes pictogram characters in raw-text into HTML entities. 
     33        Encodes pictogram characters in raw-text into HTML entities. If 
     34        $rawtext contains extended pictograms, they are encoded in Unicode 
     35        format. If you add "unicode" option explicitly, all pictogram 
     36        characters are encoded in Unicode format (""). Otherwise, 
     37        encoding is done in decimal format ("&#NNNNN;"). 
    3338 
    3439    decode_pictogram 
    3540          $rawtext = decode_pictogram($html); 
    3641 
    37         Decodes HTML entities for pictogram into raw-text. 
     42        Decodes HTML entities (both for "" and "&#NNNNN;") for 
     43        pictogram into raw-text in Shift_JIS. 
    3844 
    3945    remove_pictogram 
     
    5056        found. It returns the total numbers of charcters found in text. 
    5157 
    52         The callback is given two arguments. The first is a found pictogram 
    53         character itself, and the second is a decimal number which 
    54         represents codepoint of the character. Whatever the callback returns 
    55         will replace the original text. 
     58        The callback is given three arguments. The first is a found 
     59        pictogram character itself, and the second is a decimal number which 
     60        represents Shift_JIS codepoint of the character. The third is a 
     61        Unicode codepoint. Whatever the callback returns will replace the 
     62        original text. 
    5663 
    57         Here is an implementation of encode_pictogram(), which will be the 
    58         good example for the usage of find_pictogram(). 
     64        Here is a stub implementation of encode_pictogram(), which will be 
     65        the good example for the usage of find_pictogram(). Note that this 
     66        example version doesn't support extended pictograms. 
    5967 
    6068          sub encode_pictogram { 
    6169              my $text = shift; 
    6270              find_pictogram($text, sub { 
    63                                  my($char, $number) = @_; 
     71                                 my($char, $number, $cp) = @_; 
    6472                                 return '&#' . $number . ';'; 
    6573                             }); 
     
    6876 
    6977CAVEAT 
    70     This module works so slow, because regex used here matches "ANY" 
    71     characters in the text. This is due to the difficulty of extracting 
    72     character boundaries of Shift_JIS encoding. 
     78    *   This module works so slow, because regex used here matches "ANY" 
     79        characters in the text. This is due to the difficulty of extracting 
     80        character boundaries of Shift_JIS encoding. 
     81 
     82    *   Extended pictogram support of this module is not complete. If you 
     83        handle pictogram characters in Unicode, try Encode module with perl 
     84        5.8.0, or Unicode::Japanese. 
    7385 
    7486AUTHOR 
     
    7991 
    8092SEE ALSO 
    81     the HTML::Entities manpage, 
    82     http://www.nttdocomo.co.jp/i/tag/emoji/index.html 
     93    the HTML::Entities manpage, the Unicode::Japanese manpage, 
     94    http://www.nttdocomo.co.jp/p_s/imode/tag/emoji/ 
    8395 
  • HTML-Entities-ImodePictogram/trunk/lib/HTML/Entities/ImodePictogram.pm

    r697 r700  
    5656    $html =~ s{(\&\#(\d{5});)|(\&\#x([0-9a-fA-F]{4});)}{ 
    5757        if (defined $1) { 
    58             if (($2 >= 63647 && $2 <= 63740) || 
    59                 ($2 >= 63808 && $2 <= 63870) || 
    60                 ($2 >= 63872 && $2 <= 63919)) { 
    61                 pack 'n', $2; 
    62             } else { 
    63                 $1; 
    64             } 
     58            my $cp = _num2cp($2); 
     59            defined $cp ? pack('n', $2) : $1; 
    6560        } elsif (defined $3) { 
    66             my $cp = hex($4); 
    67             pack 'n', _cp2num($cp)
     61            my $num = _cp2num(hex($4)); 
     62            defined $num ? pack('n', $num) : $3
    6863        } 
    6964    }eg; 
     
    9186        return $num - 4773; 
    9287    } else { 
    93         require Carp; 
    94         Carp::carp("unknown number: $num"); 
    9588        return; 
    9689    } 
     
    109102        return $cp + 4773; 
    110103    } else { 
    111         require Carp; 
    112         Carp::carp("unknown codepoint: $cp"); 
    113104        return; 
    114105    } 
     
    192183The callback is given three arguments. The first is a found pictogram 
    193184character itself, and the second is a decimal number which represents 
    194 codepoint of the character. The third is a Unicode codepoint. Whatever 
    195 the callback returns will replace the original text. 
     185Shift_JIS codepoint of the character. The third is a Unicode 
     186codepoint. Whatever the callback returns will replace the original 
     187text. 
    196188 
    197189Here is a stub implementation of encode_pictogram(), which will be the