From SRS0=XnMxyl72=OX=yahoo.com=andrewnikitin@taz.de  Thu Sep 27 21:58:29 2007
Delivered-To: czyborra@gmail.com
Received: by 10.142.76.15 with SMTP id y15cs63515wfa;
        Thu, 27 Sep 2007 11:25:22 -0700 (PDT)
Received: by 10.66.219.11 with SMTP id r11mr3931780ugg.1190917520701;
        Thu, 27 Sep 2007 11:25:20 -0700 (PDT)
Return-Path: <SRS0=XnMxyl72=OX=yahoo.com=andrewnikitin@taz.de>
Received: from fmmailgate03.web.de (fmmailgate03.web.de [217.72.192.234])
        by mx.google.com with ESMTP id 29si2382550uga.2007.09.27.11.25.19;
        Thu, 27 Sep 2007 11:25:20 -0700 (PDT)
Received-SPF: neutral (google.com: 217.72.192.234 is neither permitted nor denied by best guess record for domain of SRS0=XnMxyl72=OX=yahoo.com=andrewnikitin@taz.de) client-ip=217.72.192.234;
DomainKey-Status: bad (test mode)
Authentication-Results: mx.google.com; spf=neutral smtp.mail=SRS0=XnMxyl72=OX=yahoo.com=andrewnikitin@taz.de; domainkeys=hardfail (test mode) header.From=andrewnikitin@yahoo.com
Received: from mx26.web.de (mx26.dlan.cinetic.de [172.20.5.104])
	by fmmailgate03.web.de (Postfix) with ESMTP id 972259EA76C1
	for <czyborra@googlemail.com>; Thu, 27 Sep 2007 20:25:19 +0200 (CEST)
Received: from [194.29.227.41] (helo=ghostwriter.taz.de)
	by mx26.web.de with esmtp (WEB.DE 4.107 #114)
	id 1Iay35-0005wd-00
	for plvd.org@web.de; Thu, 27 Sep 2007 20:25:19 +0200
Received: from jupiter.hal.taz.de (jupiter.hal.taz.de [10.1.0.113])
	by ghostwriter.taz.de (8.13.8/8.13.8/Debian-3) with ESMTP id l8RIOCx5012217
	for <plvd.org@web.de>; Thu, 27 Sep 2007 20:24:13 +0200
Received: from spambuster.taz.de (osiris.hal.taz.de [10.1.0.4])
	by jupiter.hal.taz.de (8.13.6/8.13.6) with ESMTP id l8RIPFnx013788
	for <roman@czyborra.com>; Thu, 27 Sep 2007 20:25:15 +0200
Received: from web60913.mail.yahoo.com (web60913.mail.yahoo.com [209.73.179.2])
	by spambuster.taz.de (8.13.6/8.13.6) with SMTP id l8RIOts8006210
	for <roman@czyborra.com>; Thu, 27 Sep 2007 20:24:56 +0200
Received: (qmail 23309 invoked by uid 60001); 27 Sep 2007 18:24:49 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID;
  b=nilzHOVvURlBgn58yYgC1twUb/zSDPCV9/GdfldKwrJAiC8gFJtTJicvZLNGrV8umyOx6XNFPm7RULzrsJ2WgCwsP4bF9T7x0+7XWwmbKn0Lok3Y11facZUNicugmSLQS4m5LwThFToL5mctI1eHuNKgMmImdJIFcP4kNZM+NH4=;
X-YMail-OSG: jm6_buMVM1kjm8MWURf33u_raUKF63vPNBz2lQ2dp9RsVdEEmP5yNiaJ_XyJnMRmrxWL_fnUCGG2rei7uN26jSMrplTGidB6t4oI_X3m7c0XF0ag5M8rTbDDHcqGYK0-
Received: from [198.208.251.22] by web60913.mail.yahoo.com via HTTP; Thu, 27 Sep 2007 11:24:49 PDT
Date: Thu, 27 Sep 2007 11:24:49 -0700 (PDT)
From: Andrew Nikitin <andrewnikitin@yahoo.com>
Subject: Re: unifont.png
To: roman_czyborra <roman@czyborra.com>
In-Reply-To: <fdg4ns+keav@eGroups.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="0-659259426-1190917489=:22433"
Message-ID: <453595.22433.qm@web60913.mail.yahoo.com>
X-Scanned-By: MIMEDefang 2.57 on 10.1.0.141
X-Scanned-By: MIMEDefang 2.52 on 194.29.227.46
X-WEBDE-FORWARD: plvd.org@web.de -> czyborra@googlemail.com
X-IMAPbase: 1190923159 1
Status: RO
X-Status: A
X-Keywords:                     
X-UID: 1

--0-659259426-1190917489=:22433
Content-Type: text/plain; charset=iso-8859-1
Content-Id: 
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

Hello, Roman.

I am glad my small contribution turned out useful.

I created the sample with
perl sample.pl unifont.hex | perl banner.pl -iutf16 -w 640 | perl ..\g\bmp.pl z

I then converted resulting z.bmp (or z0001.bmp) into png with some conversion
tool, not sure which one.
(scripts are attached)

sample.pl is trivial, banner.pl can read unicode text in various forms and
output it in various form, including "banner" format -- rows of ascii
characters '1' and '0' to denote black and white pixels. This banner format is
converted to bmp with bmp.pl which reads banner from input.

As i mentioned already, sample.pl is trivial, so I am also including
usample.pl, which I am slightly more proud of. You may find it useful for the
purpose of visualising implemented unicode ranges. It generates html, which is
later converted into banner then to bmp.

perl usample.pl unifont.hex | perl banner.pl -ihtml | perl ..\g\bmp.pl z


--- roman_czyborra <roman@czyborra.com> wrote:

> Dear Andrew, thank you for creating the unifont.png - I have just
> discovered it and installed it as
> http://czyborra.com/unifont/unifont.png and as
> http://commons.wikimedia.org/wiki/GNU_unifont.png for
> http://en.wikipedia.org/wiki/Unifont - looks beautiful!  Do you still
> have the script to make the PNG for us to share?  Cheers: Roman
> 
> 


____________________________________________________________________________________
Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
http://smallbusiness.yahoo.com/webhosting 
--0-659259426-1190917489=:22433
Content-Type: text/plain; name="banner.pl"
Content-Description: 831548463-banner.pl
Content-Disposition: inline; filename="banner.pl"
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by ghostwriter.taz.de id l8RIOCx5012217

=3Dhead1 NAME

uniconv.pl, banner.pl -- convert unicode text into different presentation=
s

=3Dhead1 SYNOPSIS

perl banner.pl {options} <inputfile

=3Dhead1 OPTIONS

=3Dover 4

=3Ditem -i input_input

Parameter is a keyword that represents input file encoding.
Possible values are:

ascii, latin1, iso88591 -- 8bit text

utf16 -- 2 bytes per each char in the input file

utf7 -- input is encoded in utf-7

utf8 -- input is encoded in utf-8

utf75 -- input is encoded in utf-7,5

cp:<filename> -- 1 byte per character according to codepage in a given fi=
le.
Format of codepage file is like in http://czyborra.com. if <filename> is
just a number then codepage file used is "./cp/cp<number>.txt".

html:<codepage> -- ascii and iso-8859-1 characters are transmitted as is,
characters with numerical value bigger that 255 are transmitted in
&#<unicode_number>; form. HTML entities are allowed on input; <codepage>
sets codepage for plain text if specified.

=3Ditem -o output_format

Parameter is keyword, specifying output format. In addition to the ones l=
isted
as input formats, may be one of the following:

banner -- every symbol from the input is represented as big text matrix;
see -f and -w options

pcx -- graphic in pcx format (pipes 'banner' output to external pcx
conversion perl script)

rtf -- Microsoft rich text format, Courier New 10 pt. font. Please,=20
note, that Wordpad is capable of displaying and printing unicode characte=
rs,
but is not capable of saving them.

=3Ditem -f font_file_names

Font filename with unicode glyphs. Should be in .hex format.
Default is 'unifont.hex' (see http://czyborra.com/unifont).

It is possible to several files here. File names should be comma
separated
and no space is allowed around comma. All fonts must be of the same size =
and
first file has to have fontheight directive on the first line.
Glyph definitions found in latter files override those found in earlier f=
iles.

=3Ditem -w width

Specifies maximum possible width (in characters or pixel, depending on th=
e
output type) of the output file.

=3Ditem -h height

Specifies maximum possible height in pixels for 'gif' output type. When t=
his
heigh is exceeded new page starts.

=3Dback

=3Dhead1 HISTORY

=3Dover 4

=3Ditem 2001-04-20

html input mode

=3Ditem 2001-07-05

codepage input/output modes

=3Ditem 2001-07-06

possibility to use fonts of arbitrary heigth and width. Font height is
specified in .hex file as ':FONTHEIGHT=3Dddd'. Default -- 16 (for
compatibility with unifont.hex).

Also added shortcuts to codepage files: if only a number .

=3Ditem 2001-07-10

pipe output to ../g/pcx.pl script for 'pcx' output mode

=3Ditem 2001-07-13

prints U+FFFD (REPLACEMENT CHARACTER) instead of symbols not found in the=
 font

=3Ditem 2001-07-31

restructured source into modular form (allows -o pcx:fontname or -o
banner:fontname among other things);=20

:v :n modifiers for utf16 -- explicitly set byte order and put BOM in the
beginning of output;

! a lot of changes, watch for bugs;

=3Ditem 2001-08-01

u::html::input recognizes html entities (&mu; &lt; etc.)

=3Ditem 2001-08-21

fixed misspelled reference to base class u::abstract in u::banner;

:fontheight need not start from the first column any longer;

=3Ditem 2001-09-13

search for unifont.hex, codepages and pcx.pl relatively to script locatio=
n,
not current directory;

codepage in u::html;

=3Ditem 2001-11-06

cp name substitution is done in 2 stages. First it is completed with cp a=
nd
.txt (if consists only from digits) and then path is added (if no path
originally specified)

=3Ditem 2001-11-09

-h (page height) switch is actually used during banner output. "\f\n" is
inserted after every $opt_h lines of output.

fixed problem with passing empty codepage into html decoder

=3Ditem 2002-01-07

in cp (html) input mode if character is not found in codepage table, it
is processed as is (instead of fffd before)

=3Ditem 2002-05-20

if path is not specified in -f option then defaults it to $0/../

=3Ditem 2002-05-24

allow many-to many relationship in codepages. First found matching pair
provides substitution.

=3Ditem 2002-06-11

process ^L in the input files for banner output type. watch for bugs for
pcx output type.

=3Ditem 2004-06-30

rtf output

=3Ditem 2004-07-19

do not display zero width non-breakable space (U+FEFF) in graphic output

=3Ditem 2004-07-22

prefix \,{,} in rtf output with a backslash

=3Ditem 2004-08-12

allow multiple comma separated fontnames in -f option

=3Dback

=3Dhead1 TODO

  u::utf7::output
  u::utf75::input
  phonetic substitutions in u::ascii::output


=3Dcut

use strict;

package u::abstract ;

sub new {
  bless {}, shift();
}

############# Input modes ########################
# Input routines accept next character from the stream, store it=20
# in the buffer and when buffer contains enough information to produce
# next few unicode characters, these characters are being passed to=20
# output functions (via global ::printwchar()).
sub input_init{};
sub input { die "->input is not implemented for ",ref(shift)}
# After there is no more characters in the imput ->input_flush() is calle=
d
# to flush all nonempty input buffers.
sub input_flush{}


############# Output modes ########################
# ->output method accepts integer and uses 'print' for output.
sub output_init {}
sub output {die "->output is not implemented for ",ref(shift)}
sub output_flush{}

package u::ascii;
@u::ascii::ISA=3Dqw(u::abstract);

sub input{
  &::printwchar(ord($_[1])) if '' ne $_[1];
}

############# Codepage encoded input/output #############
package u::cp;
@u::cp::ISA=3Dqw(u::abstract);

sub new {
  warn "u::cp::new @_" if $::opt_d;
  my $class=3Dshift;
  my $fn=3Dshift; # optional parameter -- cp filename
  my $cp=3D{};
  ::read_cp($cp,$fn) if '' ne $fn;
  bless $cp, $class;
}

sub input {
  my $this=3Dshift();
  my $i=3Dord($_[0]);
  my $u;=20
  $u=3D$this->{"c$i"};
  if( '' eq $u ) {
    $u=3D$i;
  } else {
    $u=3D0xFFFD if '' eq $u; # Unicode REPLACEMENT CHARACTER=20
  }=20
  ::printwchar($u);
}

sub output {
  my $this=3Dshift();
  my $code=3Dshift();
  my $c=3D$this->{"u$code"};
  if( '' eq $c ) {
    if( 128>$code ) {
      print chr($code);
    } else {
      print '?';
    }
  } else {
    print chr($c);
  }
}

package u::html;#######################################################
@u::html::ISA=3Dqw(u::cp);

%u::html::ENTITY =3D split /\s+/,<<ENTITY_DATA;
nbsp 160         szlig 223        zeta 950         cap 8745   =20
iexcl 161        agrave 224       eta 951          cup 8746   =20
cent 162         aacute 225       theta 952        int 8747   =20
pound 163        acirc 226        iota 953         there4 8756=20
curren 164       atilde 227       kappa 954        sim 8764   =20
yen 165          auml 228         lambda 955       cong 8773  =20
brvbar 166       aring 229        mu 956           asymp 8776 =20
sect 167         aelig 230        nu 957           ne 8800    =20
uml 168          ccedil 231       xi 958           equiv 8801 =20
copy 169         egrave 232       omicron 959      le 8804    =20
ordf 170         eacute 233       pi 960           ge 8805    =20
laquo 171        ecirc 234        rho 961          sub 8834   =20
not 172          euml 235         sigmaf 962       sup 8835   =20
shy 173          igrave 236       sigma 963        nsub 8836  =20
reg 174          iacute 237       tau 964          sube 8838  =20
macr 175         icirc 238        upsilon 965      supe 8839  =20
deg 176          iuml 239         phi 966          oplus 8853 =20
plusmn 177       eth 240          chi 967          otimes 8855=20
sup2 178         ntilde 241       psi 968          perp 8869  =20
sup3 179         ograve 242       omega 969        sdot 8901  =20
acute 180        oacute 243       thetasym 977     lceil 8968 =20
micro 181        ocirc 244        upsih 978        rceil 8969 =20
para 182         otilde 245       piv 982          lfloor 8970=20
middot 183       ouml 246         bull 8226        rfloor 8971=20
cedil 184        divide 247       hellip 8230      lang 9001  =20
sup1 185         oslash 248       prime 8242       rang 9002  =20
ordm 186         ugrave 249       Prime 8243       loz 9674   =20
raquo 187        uacute 250       oline 8254       spades 9824=20
frac14 188       ucirc 251        frasl 8260       clubs 9827 =20
frac12 189       uuml 252         weierp 8472      hearts 9829=20
frac34 190       yacute 253       image 8465       diams 9830 =20
iquest 191       thorn 254        real 8476        quot 34    =20
Agrave 192       yuml 255         trade 8482       amp 38     =20
Aacute 193       fnof 402         alefsym 8501     lt 60      =20
Acirc 194        Alpha 913        larr 8592        gt 62      =20
Atilde 195       Beta 914         uarr 8593        OElig 338  =20
Auml 196         Gamma 915        rarr 8594        oelig 339  =20
Aring 197        Delta 916        darr 8595        Scaron 352 =20
AElig 198        Epsilon 917      harr 8596        scaron 353 =20
Ccedil 199       Zeta 918         crarr 8629       Yuml 376   =20
Egrave 200       Eta 919          lArr 8656        circ 710   =20
Eacute 201       Theta 920        uArr 8657        tilde 732  =20
Ecirc 202        Iota 921         rArr 8658        ensp 8194  =20
Euml 203         Kappa 922        dArr 8659        emsp 8195  =20
Igrave 204       Lambda 923       hArr 8660        thinsp 8201=20
Iacute 205       Mu 924           forall 8704      zwnj 8204  =20
Icirc 206        Nu 925           part 8706        zwj 8205   =20
Iuml 207         Xi 926           exist 8707       lrm 8206   =20
ETH 208          Omicron 927      empty 8709       rlm 8207   =20
Ntilde 209       Pi 928           nabla 8711       ndash 8211 =20
Ograve 210       Rho 929          isin 8712        mdash 8212 =20
Oacute 211       Sigma 931        notin 8713       lsquo 8216 =20
Ocirc 212        Tau 932          ni 8715          rsquo 8217 =20
Otilde 213       Upsilon 933      prod 8719        sbquo 8218 =20
Ouml 214         Phi 934          sum 8721         ldquo 8220 =20
times 215        Chi 935          minus 8722       rdquo 8221 =20
Oslash 216       Psi 936          lowast 8727      bdquo 8222 =20
Ugrave 217       Omega 937        radic 8730       dagger 8224=20
Uacute 218       alpha 945        prop 8733        Dagger 8225=20
Ucirc 219        beta 946         infin 8734       permil 8240=20
Uuml 220         gamma 947        ang 8736         lsaquo 8249=20
Yacute 221       delta 948        and 8743         rsaquo 8250=20
THORN 222        epsilon 949      or 8744          euro 8364  =20
ENTITY_DATA

sub new {
  my $class=3Dshift();
  my $class=3Dref $class || $class;
  warn "lt=3D$u::html::ENTITY{lt}, class=3D$class" if $::opt_d;
  my $this =3D SUPER::new $class (@_); =20
  $this->{i_buf}=3D'';
  bless $this, $class;
}

sub input_flush {
  my $this=3Dshift();
  warn "unfinished html sequence: $this->{i_buf}" if length $this->{i_buf=
};
}

sub input {
  my $this=3Dshift();
  my $pb=3D\($this->{i_buf});
  if( '' eq $$pb ) {
    if( '&' eq $_[0] ){$$pb=3D'&'}
    else {=20
      # ::printwchar(ord($_[0]))
      $this->SUPER::input(@_);
    }
  } else {
    unless (';' eq $_[0] ) {
      $$pb.=3D$_[0];
    } else {
      if(0){
      } elsif( '&#x' eq substr($$pb,0,3) ) {
        ::printwchar(hex(substr($$pb,3)));
      } elsif( '&#' eq substr($$pb,0,2) ) {
        ::printwchar(0+substr($$pb,2));
      } elsif( defined $u::html::ENTITY{substr($$pb,1)} ) {
        ::printwchar($u::html::ENTITY{substr($$pb,1)});
      } else {
        for(my $i=3D0; $i<length($$pb); ++$i) {
#          ::printwchar(ord(substr($$pb,$i,1)));
           $this->SUPER::input(ord(substr($$pb,$i,1)));
        }
#        ::printwchar(ord($_[0]));
         $this->SUPER::input(ord($_[0]));
	warn "Bad html_i buffer: $$pb";
      }
      $$pb=3D'';
    }
  }
}

sub output() {
  my $this=3Dshift();
  my $code=3Dshift();
  print((256>$code)?chr($code):"&#"."$code;");
}

package u::utf7 ;######################################################
@u::utf7::ISA=3Dqw(u::abstract);

$u::utf7::base64=3D"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0=
123456789+/";

sub new {
  bless {buf=3D>0, shift=3D>-1}, shift();
}

sub input {
  my $this=3Dshift();
  my $c=3D$_[0];
  if(-1=3D=3D$this->{shift} ) {
    if('+' eq $c ) {
      $this->{shift}=3D0;
      return;
    }
    ::printwchar( ord($c));
  } elsif ( 0=3D=3D$this->{shift} && '-' eq $c ) {
# if first '+' is followed by anything other that '-' or base64 symbol
# this is an ill-formed sequence
    $this->{shift}=3D-1;
    ::printwchar( ord('+'));
    return;
  } else {
    my $v=3Dindex($u::utf7::base64,$c);
    if( 0<=3D$v) {
#     print "[",substr(unpack("B8",pack("c",$v)),2),"]";
      if(16<=3D6+$this->{shift}) {
        $this->{buf}=3D($this->{buf} << 16-$this->{shift}) | ($v >> $this=
->{shift}-10);
        ::printwchar( $this->{buf} );
        $this->{shift}=3D$this->{shift}-10;
        $this->{buf}=3D$v & ~(-1 << $this->{shift});
#print "{",unpack("H2",pack("c",$this->{buf})),"}";
      } else {
        $this->{buf}=3D($this->{buf}<<6) | ($v & 63);
        $this->{shift}+=3D6;
      }
    } else {
      if($this->{buf}!=3D0) {print"<E!>";}
      $this->{shift}=3D-1;
#     print ":";
      if('-' ne $c) { ::printwchar( ord($c) )}
    }
  }
}


package u::utf8;#######################################################
@u::utf8::ISA=3Dqw(u::abstract);
# utf8:
#bytes | bits | representation
#    1 |    7 | 0vvvvvvv
#    2 |   11 | 110vvvvv 10vvvvvv
#    3 |   16 | 1110vvvv 10vvvvvv 10vvvvvv
#    4 |   21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv
# ??
#    5 |   26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
#    6 |   31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
#    7 |   36 | 11111110 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10v=
vvvvv=20

sub new {
  bless {i_cnt=3D>0, i_buf=3D>0,}, shift();
}

sub input_flush {
  my $this=3Dshift();
  warn "unfinished utf8 input sequence" if 0<$this->{i_cnt} ;
}

sub input {
  my $this=3Dshift();
  my $code=3Dord($_[0]);
  if( 0=3D=3D$this->{i_cnt} ) {
#   print "<$code,$this->{i_cnt},$this->{i_buf}>";
#   print 0xC0=3D=3D($code & 0xE0),",",0xC0,",",$code & 0xE0;;
START_SEQUENCE:
    if( 0=3D=3D($code & 0x80) ) { ::printwchar($code) }
    elsif( 0x80=3D=3D($code & 0xC0) ) { warn "utf8 continuation without h=
eader on input:$code" } # ignore illegal code
    elsif( 0xC0=3D=3D($code & 0xE0) ) {$this->{i_cnt}=3D1; $this->{i_buf}=
=3D$code & 0x1F; }
    elsif( 0xE0=3D=3D($code & 0xF0) ) {$this->{i_cnt}=3D2; $this->{i_buf}=
=3D$code & 0x0F; }
    elsif( 0xF0=3D=3D($code & 0xF8) ) {$this->{i_cnt}=3D3; $this->{i_buf}=
=3D$code & 0x07; }
    elsif( 0xF8=3D=3D($code & 0xFC) ) {$this->{i_cnt}=3D4; $this->{i_buf}=
=3D$code & 0x03; }
    elsif( 0xFC=3D=3D($code & 0xFE) ) {$this->{i_cnt}=3D5; $this->{i_buf}=
=3D$code & 0x01; }
    elsif( 0xFE=3D=3D$code || 0xFF=3D=3D$code) {warn"illegal utf-8 input =
code:$code"}
  } else {
    if( 0x80=3D=3D($code & 0xC0) ) {
      $this->{i_buf}=3D($this->{i_buf} << 6) | ($code & 0x3F);
      ::printwchar($this->{i_buf}) if 0=3D=3D--($this->{i_cnt}) ;
    } else {
      warn "no continuation of utf8 input sequence: $code";
      $this->{i_cnt}=3D0;
      goto START_SEQUENCE;
    }
  }
}

sub output {
  my $this=3Dshift();
  my $code=3Dshift();
  if ($code < 0x80) {
    print chr($code);
  } elsif ($code < 0x800) {
    print pack('cc',0xC0 | ($code>>6), 0x80 | ($code & 0x3F));
  } elsif ($code < 0x10000) {
    print pack('ccc', 0xE0|($code>>12), 0x80|($code>>6)&0x3F, 0x80|$code&=
0x3F);
  } elsif ($code < 0x200000) {
    print pack('cccc', 0xF0 | ($code>>18), 0x80 | ($code>>12) & 0x3F, 0x8=
0 | ($code>>6) & 0x3F, 0x80 | $code & 0x3F);
  }
}

package u::utf16;######################################################
@u::utf16::ISA=3Dqw(u::abstract);

sub new {
  my $class=3Dshift();
  my $par=3Dshift();
  my $this=3D{buf=3D>'', bo=3D>'',};
  if( 'v' eq $par || 'n' eq $par ) { $this->{bo}=3D$par }
  warn "Byteorder=3D'$this->{bo}'\n" if $::opt_d;
  bless $this, $class;
}

sub input{
  my $this=3Dshift();
  print "[",ord($_[0]),"]" if $::opt_d;
  $this->{buf} .=3D $_[0];
  return unless 2=3D=3Dlength($this->{buf});
  # TODO: process surrogate pairs
  # N=3D(H-0xD800) * 0x400 + (L-0xDC00) + 0x10000;
  my $w;
#  print unpack("H*",$this->{buf}),$this->{bo},unpack($this->{bo},$this->=
{buf}) if $::opt_d;
  if( '' eq $this->{bo}) {
    $w=3Dunpack('n',$this->{buf});
    $this->{bo}=3D0xFFFE=3D=3D$w?'v':'n';
    $this->{buf}=3D'';
    return if 0xFFFE=3D=3D$w || 0xFEFF=3D=3D$w;
  } else {
    $w=3Dunpack($this->{bo},$this->{buf});
  }
  ::printwchar($w);
  $this->{buf}=3D'';
}

sub output {
  my $this=3Dshift();
  my $code=3Dshift();
  if( 0xFEFF!=3D$code && 0=3D=3D$this->{out_cnt} && '' ne $this->{bo}) {
    print pack($this->{bo}, 0xFEFF);
  }
  $this->{bo}=3D'n' if '' eq $this->{bo};
  if( $code>0xFFFF ) {
    print pack($this->{bo} x 2, (0xD7C0+($code>>10)), (0xDC00| $code & 0x=
3FF));
  } else {
    print pack($this->{bo},$code);
  }
  $this->{out_cnt}++;
}

############# Output-only modes ########################


package u::utf75;######################################################
@u::utf75::ISA=3Dqw(u::abstract);

#bytes | bits | representation
#    1 |    7 | 0vvvvvvv
#    2 |   10 | 1010vvvv 11vvvvvv
#    3 |   16 | 1011vvvv 11vvvvvv 11vvvvvv
sub output {
  my $this=3Dshift();
  my $code=3Dshift();
  if ($code < 0x80) {
    print chr($code);
  } elsif ($code < 0x400) {
    print pack('cc',0xA0 | ($code>>6), 0xC0 | ($code & 0x3F));
  } elsif ($code < 0x10000) {
    print pack('ccc', 0xB0|($code>>12), 0xC0|($code>>6)&0x3F, 0xC0|$code&=
0x3F);
  } elsif ($code < 0x110000) {
    ::printwchar(0xD7C0 + ($code>>10));
    ::printwchar(0xDC00 + ($code & 0x3FF));
  }
}

#  TODO:
#  Special characters that may require attention while rendering
#  3.9 Special Character Properties Conformance=20
#  Copyright =A9 1991=AD2000 by Unicode, Inc. The Unicode Standard=20
#  . Line boundary control=20
#  0009 HORIZONTAL TAB=20
#  000A LINE FEED=20
#  000C FORM FEED=20
#  000D CARRIAGE RETURN=20
#  0020 SPACE=20
#  00A0 NO=ADBREAK SPACE=20
#  0F0B TIBETAN MARK INTERSYLLABIC TSHEG=20
#  0F0C TIBETAN MARK DELIMITER TSHEG BSTAR=20
#  2000 EN QUAD=20
#  2002 EN SPACE=20
#  2003 EM SPACE=20
#  2004 THREE=ADPER=ADEM SPACE=20
#  2005 FOUR=ADPER=ADEM SPACE=20
#  2006 SIX=ADPER=ADEM SPACE=20
#  2007 FIGURE SPACE=20
#  2008 PUNCTUATION SPACE=20
#  2009 THIN SPACE=20
#  200A HAIR SPACE=20
#  200B ZERO WIDTH SPACE=20
#  2011 NON=ADBREAKING HYPHEN=20
#  2028 LINE SEPARATOR=20
#  2029 PARAGRAPH SEPARATOR=20
#  202F NARROW NO=ADBREAK SPACE=20
#  FEFF ZERO WIDTH NO=ADBREAK SPACE=20
#  . Hyphenation control=20
#  002D HYPHEN=ADMINUS=20
#  00AD SOFT HYPHEN=20
#  058A ARMENIAN HYPHEN=20
#  1806 MONGOLIAN TODO SOFT HYPHEN=20
#  2010 HYPHEN=20
#  2011 NON=ADBREAKING HYPHEN=20
#  2027 HYPHENATION POINT=20
#  . Fraction formatting=20
#  2044 FRACTION SLASH=20
#  . Special behavior with nonspacing marks=20
#  0020 SPACE=20
#  0069 LATIN SMALL LETTER I=20
#  006A LATIN SMALL LETTER J=20
#  00A0 NO=ADBREAK SPACE=20
#  0131 LATIN SMALL LETTER DOTLESS I=20
#  . Double nonspacing marks=20
#  0360 COMBINING DOUBLE TILDE=20
#  0361 COMBINING DOUBLE INVERTED BREVE=20
#  0362 COMBINING DOUBLE RIGHTWARDS ARROW BELOW=20
#  . Joining=20
#  200C ZERO WIDTH NON=ADJOINER=20
#  200D ZERO WIDTH JOINER=20
#  . Bidirectional ordering=20
#  200E LEFT=ADTO=ADRIGHT MARK=20
#  200F RIGHT=ADTO=ADLEFT MARK=20
#  202A LEFT=ADTO=ADRIGHT EMBEDDING=20
#  202B RIGHT=ADTO=ADLEFT EMBEDDING=20
#  202C POP DIRECTIONAL FORMATTING=20
#  202D LEFT=ADTO=ADRIGHT OVERRIDE=20
#  202E RIGHT=ADTO=ADLEFT OVERRIDE=20
#  . Alternate formatting=20
#  206A INHIBIT SYMMETRIC SWAPPING=20
#  206B ACTIVATE SYMMETRIC SWAPPING=20
#  206C INHIBIT ARABIC FORM SHAPING=20
#  206D ACTIVATE ARABIC FORM SHAPING=20
#  206E NATIONAL DIGIT SHAPES=20
#  206F NOMINAL DIGIT SHAPES=20
#  . Syriac abbreviation=20
#  070F SYRIAC ABBREVIATION MARK=20
#  . Indic dead=ADcharacter formation=20
#  094D DEVANAGARI SIGN VIRAMA=20
#  09CD BENGALI SIGN VIRAMA=20
#  0A4D GURMUKHI SIGN VIRAMA=20
#  0ACD GUJARATI SIGN VIRAMA=20
#  0B4D ORIYA SIGN VIRAMA=20
#  0BCD TAMIL SIGN VIRAMA=20
#  0C4D TELUGU SIGN VIRAMA=20
#  0CCD KANNADA SIGN VIRAMA=20
#  0D4D MALAYALAM SIGN VIRAMA=20
#  0DCA SINHALA SIGN AL=ADLAKUNA=20
#  0F84 TIBETAN SIGN HALANTA=20
#  1039 MYANMAR SIGN VIRAMA=20
#  17D2 KHMER SIGN COENG=20
#  . Mongolian variant selectors=20
#  180B MONGOLIAN FREE VARIATION SELECTOR ONE=20
#  180C MONGOLIAN FREE VARIATION SELECTOR TWO=20
#  180D MONGOLIAN FREE VARIATION SELECTOR THREE=20
#  180E MONGOLIAN VOWEL SEPARATOR=20
#  . Ideographic variation indication=20
#  303E IDEOGRAPHIC VARIATION INDICATOR=20
#  . Ideographic description=20
#  2FF0 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT=20
#  2FF1 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO BELOW=20
#  2FF2 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT=20
#  2FF3 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO MIDDLE AND BELOW=20
#  2FF4 IDEOGRAPHIC DESCRIPTION CHARACTER FULL SUR=AD ROUND=20
#  2FF5 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM ABOVE=20
#  2FF6 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM BELOW=20
#  2FF7 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT=20
#  2FF8 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT=20
#  2FF9 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT=20
#  2FFA IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT=20
#  2FFB IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID=20
#  . Interlinear annotation=20
#  FFF9 INTERLINEAR ANNOTATION ANCHOR=20
#  FFFA INTERLINEAR ANNOTATION SEPARATOR=20
#  FFFB INTERLINEAR ANNOTATION TERMINATOR=20
#  . Object replacement=20
#  FFFC OBJECT REPLACEMENT CHARACTER=20
#  . Code conversion fallback=20
#  FFFD REPLACEMENT CHARACTER=20
#  . Byte order signature=20
#  FEFF ZERO WIDTH NO=ADBREAK SPACE=20
package u::banner;#####################################################
@u::banner::ISA=3Dqw(u::abstract);

sub new {
  my $class=3Dshift();
  my %font;
  $font{height}=3D16;
  for my $fntnm(split',',shift || $::opt_f) {
    $fntnm =3D "$0/../$fntnm" unless $fntnm=3D~/[\/\\]/;
    for(open FONT,$fntnm or die "Cannot open: $fntnm";<FONT>;) {
      $font{unpack("n",pack("H*",$1))}=3D$2 if/^([0-9a-fA-F]+):([0-9a-fA-=
F]+)/;
      $font{height}=3D$1 if /^\s*:fontheight\s*=3D\s*(\d+)/i;
    }
    close FONT;
  }
  $font{line}=3D[];
  $font{pageheight}=3D0;
  return bless \%font, $class;
}

sub output_flush {
 my $this=3Dshift();
 $this->newline() if '' ne $this->{line}[0];
}

sub newpage {
 my $this=3Dshift();
 $this->output_flush();
 print "\f\n";
}


sub output() {
  my $this=3Dshift();
  my $code=3Dshift();
# warn "{$code}";
  if( 10=3D=3D$code ) {=20
    $this->newline() if 12 !=3D $this->{lastchar}
  } if( 13=3D=3D$code || 0xFEFF=3D=3D$code ) {=20
    return;
  } if( 12=3D=3D$code ) {=20
    $this->newpage();
  } elsif( 32>$code ) {}
  else {
    $code=3D0xFFFD unless defined $this->{$code}; #
    goto CONT unless defined $this->{$code};
    $_=3Dsubstr(unpack('B*',pack('H*',$this->{$code})),0,4*length($this->=
{$code}));
    my $cw=3Dint(length($_)/$this->{height});
    my $i=3D0;
    for( /.{$cw}/g ) {
      if( $::opt_w && (length($this->{line}[$i]) + length($_))>$::opt_w) =
{
        warn("not on first") if $i;
        $this->newline();
#print"---\n";
      }
      $this->{line}[$i++] .=3D $_;
    }
CONT:
  }
  $this->{lastchar}=3D$code;
}

sub newline() {
 my $this=3Dshift();
 if( length($this->{line}[0])>0 && $::opt_h && $this->{pageheight} >=3D $=
::opt_h ) {
   $this->{pageheight}=3D0;
   print "\f\n";
 }
 for(@{$this->{line}}) {
   print ;
   print '0' x ($::opt_w-length) if $::opt_w;
   print"\n";
 }
 $this->{pageheight}++;
 $this->{line}=3D[(('')x$this->{height})];
}

package u::rtf;########################################################
@u::rtf::ISA=3Dqw(u::abstract);

sub output_init { print "{\\rtf1{\\fonttbl{\\f0 Courier New}}\\fs20\n{\\f=
0 "; }
# {\f0   \u9554*\u9552*\u9572*\u9557*=20
sub output {
  my $this=3Dshift();
  my $code=3Dshift();
  if( 10=3D=3D$code ) {=20
    print "\n\\par ";
  } elsif( 13=3D=3D$code ) {=20
    return;
  } elsif( ord('\\')=3D=3D$code || ord('{')=3D=3D$code || ord('}')=3D=3D$=
code ) {=20
    print '\\'.chr($code);
  } elsif( 12=3D=3D$code ) {=20
    print "\n\\page\n";
  } elsif( 32>$code ) {
    # ignore other control characters
  } elsif( 128>$code ) {
    print chr($code);
  } else {
    print "\\u$code*";
  }
}

sub newpage {
 my $this=3Dshift();
 $this->output_flush();
 print "\f\n";
}

sub output_flush{ print "\\par }}\n"; }

package u::pcx;########################################################
@u::pcx::ISA=3Dqw(u::banner u::abstract);

sub output_init {
  open PCX, "| perl $0/../../g/pcx.pl";
  binmode PCX;
}

sub output_flush {
 my $this=3Dshift();
 $this->newline() if '' ne $this->{line}[0];
 close PCX;
}

sub newline() {
 my $this=3Dshift();
 if( length($this->{line}[0])>0 && $::opt_h && $this->{pageheight} >=3D $=
::opt_h ) {
   $this->{pageheight}=3D0;
   print PCX "\f\n";
 }
 for(@{$this->{line}}) {
   print PCX $_;
   print PCX '0' x ($::opt_w-length) if $::opt_w;
   print PCX "\n";
 }
 $this->{pageheight}++;
 $this->{line}=3D[(('')x$this->{height})];
}

# TODO: package u::BIF
# alchemy -B type output
# colors?

# ------------------------  Translation part -------------------------
package main;

use Getopt::Std;
our (
  $opt_f, # font file name
  $opt_w, # output width
  $opt_h, # outptut height
  $opt_i, # input mode
  $opt_o, # output mode
  $opt_d, # debug
  $opt_n, # output file name
);
getopts('w:h:f:i:o:dn:');
$opt_f ||=3D "$0/../unifont.hex";
$opt_f =3D "$0/../$opt_f" unless $opt_f=3D~/[\/\\]/;
# $opt_h ||=3D 55;
$opt_o ||=3D 'banner';
$opt_i ||=3D 'ascii';

my %utf=3D(
  html  =3D>'u::html',
  banner=3D>'u::banner',
  img   =3D>'u::banner',
  pcx   =3D>'u::pcx',
  utf16 =3D>'u::utf16',
  utf7  =3D>'u::utf7',
  utf8  =3D>'u::utf8',
  utf75 =3D>'u::utf75',
  utf9  =3D>'u::utf75',
  ascii =3D>'u::ascii',
  cp    =3D>'u::cp',
  latin1=3D>'u::ascii',
  iso88591=3D>'u::ascii',
  rtf   =3D>'u::rtf',
);

my $inp_utf=3Dutf($opt_i);
my $out_utf=3Dutf($opt_o);

$inp_utf->input_init();
$out_utf->output_init();

binmode STDIN;

if( '' ne $opt_n ) {
  close STDOUT;
  open STDOUT, ">$opt_n";
}
binmode STDOUT;


while(<STDIN>) { for( /[\0-\377]/g ) { $inp_utf->input($_) } }
$inp_utf->input_flush();
$out_utf->output_flush();

sub printwchar {
  die "obsolete call, use output_flush" if '' eq $_[0];
  $out_utf->output($_[0]);
}

############# Service functions ##################

sub utf {
  $_=3Dshift();
  /^([^:]*)/;
  my $enc=3D$1;
  my $par=3Dsubstr($',1);
  die "Unknown utf: $enc" unless defined $utf{$enc};
  my $class=3D$utf{$enc};
  warn "utf=3Dnew $class '$par'\n" if $opt_d;
  return new $class $par;
}

sub read_cp {
# Codepage file consists of lines of form
#=3D20	U+0020	SPACE
#=3D21	U+0021	EXCLAMATION MARK
#=3D22	U+0022	QUOTATION MARK
  my $cp=3Dshift; # hash reference
  my $fn=3Dshift; # filename
  warn "cp=3D$cp, fn=3D$fn" if $opt_d;
  $fn=3D"cp$fn" if $fn=3D~/^\d*$/; # shortcut to codepage
  $fn=3D"$fn.txt" unless $fn=3D~/\./; # shortcut to codepage
  $fn=3D"$0/../cp/$fn" unless $fn=3D~m{[/\\]}; # shortcut to codepage
  for( open CP, $fn or die "Cannot open $fn"; <CP>; ) {
    if( /^\s*=3D([[:xdigit:]]{2})\s+U\+([[:xdigit:]]{4,8})/ ) {
      my $c2=3Dunpack('C',pack('H*',substr('00'.$1,-2,2)));
      my $c8=3Dunpack('N',pack('H*',substr(('0' x 8).$2,-8,8)));
#     print "$c2<->$c8\n";
      $cp->{"c$c2"}||=3D$c8;
      $cp->{"u$c8"}||=3D$c2;
    }
  }
}


--0-659259426-1190917489=:22433
Content-Type: text/plain; name="sample.pl"
Content-Description: 208028102-sample.pl
Content-Disposition: inline; filename="sample.pl"

# prints every character from font file in UTF-16
binmode STDOUT;
while(<>){
  next unless /^([0-9A-Fa-f]{4}):/;
  print pack('H*',$1);
}

--0-659259426-1190917489=:22433
Content-Type: text/plain; name="usample.pl"
Content-Description: 346627872-usample.pl
Content-Disposition: inline; filename="usample.pl"

# Generate pretty columnwise sample with section headers
# 2002-05-31
# 2002-06-03 -- use Blocks.txt from unicode std
# 2003-10-24 -- do not print ignored sections
# 2004-05-05 -- optional height; do not break in the middle of the table
# 2004-05-06 -- specify "absent" character; dynamicaly define "normal" width;
#               ignore duplicate entries; initialize @COLUMNS
use strict;
use Getopt::Std;
# my $H=190; # TODO maximum allowed page height

our(
  $opt_w, # number of columns per page. Each column is 3 characters + 4 for
          # column heading
  $opt_h, # maximal page height
  $opt_t, # include html header/footer
  $opt_a, # absent, 0xfffd by default
);
getopts("w:h:ta:");

$opt_w||=16;
$opt_a||=0xfffd;
$opt_a=ord($opt_a) if( 1==length($opt_a) && $opt_a=~/\D/ );

my @COLUMNS;
my $nc=0; # number of filled columns;

my $pc=-1; # previous character
my $cs=-1; # Current section
my($ns,$nsname);
($ns,$nsname)=split ';',scalar <DATA> until $ns=~/^([[:xdigit:]]+)/;
$ns=$1;
print "<pre>\n" if $opt_t;
my $tc=0xffffff;
my $ignored_section=1;
my $ignore_next_section;
my $normalwidth=32;

my $l; # line on page
my $title=""; # title of current unicode section
my $lc=-1; # last characters (used to skip duplicate entries)
while(<>){
  next unless /^([[:xdigit:]]{4,8}):([[:xdigit:]]*)/;
  my $cc=hex('0x'.$1);
  next if $cc == $lc;
  $lc=$cc;
  $normalwidth=length($2) if 32==$cc;
  my $w=length($2)>$normalwidth;
  if($cc>=hex($ns)) {
    flush_cols() if $nc;
    while(1) {
      $ignored_section='-' eq substr($nsname,0,1);
      printlp( "" );
      printlp( "" );
      $title=sprintf "%06s. %s",$ns, $nsname;
      # $ln+=3;
      if($ignored_section) {
        checklp(5);
        printlp($title);
        printlp( "" );
        printlp( " ***");
        printlp( " *** Section is ignored");
        printlp( " ***");
      }
      $pc=-1+(0xFFFF0 & hex($ns));
 REREAD:
      ($ns,$nsname)=split /\s*;\s*/,scalar <DATA>;
      last if eof(DATA);
      goto REREAD unless $ns=~/^([[:xdigit:]]+)/;
      $ns=$1;
      last unless $cc>=hex($ns) ;
      if(!$ignored_section) {
        checklp(5);
        printlp($title);
        $title="";
        printlp( "" );
        printlp( " ***");
        printlp( " *** Empty section");
        printlp( " ***");
      }
    }
#   printf "pc=%06X, cc=%06X, ns=%06X\n",$pc, $cc, hex($ns);
  }
  if( !$ignored_section) {
    if( ""  ne $title) {
      checklp(2+17);
      printlp($title);
      $title="";
      printlp( "" );
    }
    for( my $c=$pc+1; $c<$cc; ++$c ) { add_to_col($c,$opt_a,'  '); }
    add_to_col($cc,$cc,$w?' ':'  ');
    $pc=$cc;
  } 
  last unless --$tc;
}
flush_cols() if $nc;

print "</pre>\n" if $opt_t;

# print line, while tracking line number
sub printlp
{
  if( $opt_h && $l>=$opt_h ) { $l=0; print "\f\n"; }
  if( "" ne $_[0] || 0!=$l ) {
    print $_[0],"\n";
    ++$l;
  }
}

sub checklp
{
  if( $opt_h && $l+$_[0]>=$opt_h ) { $l=0; print "\f\n"; }
}


sub add_to_col {
  my ($a,$c, $f)=@_;
  @COLUMNS=('-   ',map {sprintf " %1X  ",$_} (0..15)) if 0==@COLUMNS;
  unless( $a & 0x0f ) {
    if( $opt_w<=$nc ) { flush_cols() }
    $COLUMNS[0].='   ';
    if( (0==$nc || 0==(0x30 & $a)) && ('    ' eq substr($COLUMNS[0],-4,4)) ) {
      substr($COLUMNS[0],-3,3)=sprintf "%03X",$a>>4;
    }
    ++$nc;
  }
  $COLUMNS[1+(15&$a)].=$f."&#$c;";
}

sub flush_cols {
    checklp( scalar @COLUMNS);
    for( @COLUMNS ) { printlp( $_ ) }
    @COLUMNS=();
    $nc=0;
}

__END__
# Blocks-3.2.0.txt
# Correlated with Unicode 3.2
# Start Code..End Code; Block Name
0020..007F; Basic Latin
0080..00FF; Latin-1 Supplement
0100..017F; Latin Extended-A
0180..024F; Latin Extended-B
0250..02AF; IPA Extensions
02B0..02FF; Spacing Modifier Letters
0300..036F; Combining Diacritical Marks
0370..03FF; Greek and Coptic
0400..04FF; Cyrillic
0500..052F; Cyrillic Supplementary
0530..058F; Armenian
0590..05FF; Hebrew
0600..06FF; Arabic
0700..074F; Syriac
0780..07BF; Thaana
0900..097F; Devanagari
0980..09FF; Bengali
0A00..0A7F; Gurmukhi
0A80..0AFF; Gujarati
0B00..0B7F; Oriya
0B80..0BFF; Tamil
0C00..0C7F; Telugu
0C80..0CFF; Kannada
0D00..0D7F; Malayalam
0D80..0DFF; Sinhala
0E00..0E7F; Thai
0E80..0EFF; Lao
0F00..0FFF; Tibetan
1000..109F; Myanmar
10A0..10FF; Georgian
1100..11FF; Hangul Jamo
1200..137F; Ethiopic
13A0..13FF; Cherokee
1400..167F; Unified Canadian Aboriginal Syllabics
1680..169F; Ogham
16A0..16FF; Runic
1700..171F; Tagalog
1720..173F; Hanunoo
1740..175F; Buhid
1760..177F; Tagbanwa
1780..17FF; Khmer
1800..18AF; Mongolian
1E00..1EFF; Latin Extended Additional
1F00..1FFF; Greek Extended
2000..206F; General Punctuation
2070..209F; Superscripts and Subscripts
20A0..20CF; Currency Symbols
20D0..20FF; Combining Diacritical Marks for Symbols
2100..214F; Letterlike Symbols
2150..218F; Number Forms
2190..21FF; Arrows
2200..22FF; Mathematical Operators
2300..23FF; Miscellaneous Technical
2400..243F; Control Pictures
2440..245F; Optical Character Recognition
2460..24FF; Enclosed Alphanumerics
2500..257F; Box Drawing
2580..259F; Block Elements
25A0..25FF; Geometric Shapes
2600..26FF; Miscellaneous Symbols
2700..27BF; Dingbats
27C0..27EF; Miscellaneous Mathematical Symbols-A
27F0..27FF; Supplemental Arrows-A
2800..28FF; Braille Patterns
2900..297F; Supplemental Arrows-B
2980..29FF; Miscellaneous Mathematical Symbols-B
2A00..2AFF; Supplemental Mathematical Operators
2E80..2EFF; CJK Radicals Supplement
2F00..2FDF; Kangxi Radicals
2FF0..2FFF; Ideographic Description Characters
3000..303F; CJK Symbols and Punctuation
3040..309F; Hiragana
30A0..30FF; Katakana
3100..312F; Bopomofo
3130..318F; Hangul Compatibility Jamo
3190..319F; Kanbun
31A0..31BF; Bopomofo Extended
31F0..31FF; Katakana Phonetic Extensions
3200..32FF; Enclosed CJK Letters and Months
3300..33FF; CJK Compatibility
3400..4DBF; CJK Unified Ideographs Extension A
4E00..9FFF; -CJK Unified Ideographs
A000..A48F; Yi Syllables
A490..A4CF; Yi Radicals
AC00..D7AF; -Hangul Syllables
D800..DB7F; High Surrogates
DB80..DBFF; High Private Use Surrogates
DC00..DFFF; Low Surrogates
E000..F8FF; Private Use Area
F900..FAFF; CJK Compatibility Ideographs
FB00..FB4F; Alphabetic Presentation Forms
FB50..FDFF; Arabic Presentation Forms-A
FE00..FE0F; Variation Selectors
FE20..FE2F; Combining Half Marks
FE30..FE4F; CJK Compatibility Forms
FE50..FE6F; Small Form Variants
FE70..FEFF; Arabic Presentation Forms-B
FF00..FFEF; Halfwidth and Fullwidth Forms
FFF0..FFFF; Specials
10300..1032F; Old Italic
10330..1034F; Gothic
10400..1044F; Deseret
1D000..1D0FF; Byzantine Musical Symbols
1D100..1D1FF; Musical Symbols
1D400..1D7FF; Mathematical Alphanumeric Symbols
20000..2A6DF; CJK Unified Ideographs Extension B
2F800..2FA1F; CJK Compatibility Ideographs Supplement
E0000..E007F; Tags
F0000..FFFFF; Supplementary Private Use Area-A
100000..10FFFF; Supplementary Private Use Area-B

--0-659259426-1190917489=:22433
Content-Type: text/plain; name="bmp.pl"
Content-Description: 2765860354-bmp.pl
Content-Disposition: inline; filename="bmp.pl"

# Convert text file with 1's and 0's into uncompressed .bmp file
# Usage:
#   perl bmp.pl {filename_template | >file.bmp} <banner.txt
#
# filename_template may contain %d format specification. In this case if
# multipage page number is substituted for it. If ther is no '%' sign
# anywhere in trmplate, '%04d' is inserted right before extension. If
# no extension is specified, '.bmp' is used.
#
# banner.txt is a text file with 1's for black dots and 0's for white
# dots (NB. which is opposite of how paint.exe usually encodes)
# 
# 2003-10-06 11:23:30 -- copied from cropbmp.pl
# 2003-10-07 08:19:49 -- b/w with reversed colors works
# 2003-10-09 09:45:49 -- implemented multipage .ban files (pages are
#                        delimited with ^L (0x0C) and filename template
# 2006-05-05 10:23:14 -- 4 bit image generation, palette spec
#


use strict;
use FileHandle;

our $VERSION='$Revision: 1.5 $';

use Getopt::Std;
our (
  $opt_s, # comma delimited list of groups of charactes that represent palette entry
  $opt_b, # force number of bits
  $opt_p, # palette file or RRGGBB hex list
  $opt_d,
  $opt_f, # fill color for lines of uneven length
);

getopts("s:b:p:d");
our %PAL_DEF=( # Default palettes for different bit depths
  1=>'000000,ffffff',
  4=>'000000,000080,008000,008080,800000,800080,808000,808080,c0c0c0,0000ff,00ff00,00ffff,ff0000,ff00ff,ffff00,ffffff',
);

# ms paint bmps are usually coded 0 black, 1 white, while banner codes
# 0 white, 1 black
$opt_b||=1;
if( 1==$opt_b ) {
  $opt_s||='1#*,0-. ';
  $opt_f||='1';
} elsif(4==$opt_b) {
  $opt_s||='1#*,%,2,3,4,5,6,7,8,9,a,b,c,d,e,f0-. ';
  $opt_f||='f';
} else {
  die "The only supported bit depths are 1 and 4";
}
$opt_p||=$PAL_DEF{$opt_b};
if( '@' eq substr($opt_p,0,1) ) {
  die "reading palette from file is not supported";
  $opt_p=substr($opt_p,1);
  warn "read palette (and symbols) from text file $opt_p" if $opt_d;
  # (?:symbolset:)?[[:xdigit:]]{,6}(?:,)?
  open FILE,$opt_p;
  close FILE;
}


my @symbols=split(',',$opt_s);

our($TR_FROM,$TR_TO);
{
  $TR_FROM=join '',@symbols;
  my $A='0123456789abcdef';
  my $i=-1;
  $TR_TO=join'',map{$i++;substr($A,$i,1) x length}@symbols;
  $TR_FROM=~s/-/\\-/g;
  $TR_TO=~s/-/\\-/g;
}

our %MSTYPE=( # converts MS type into pack/unpack format character
   WORD=>'v',
  DWORD=>'V',
   LONG=>'V',
);

# Create template for bitmap file and image headers
# @bm?v -- list of structure's field names in correct order
# $bm?f -- pack format string
# For the purpose of this script they can be combined in one record.
# They should be read separately since biSize determines the size of
# the infoheader structure.
our ($bmff, @bmfv)=read_C_struct(<<BITMAPFILEHEADER);
    WORD    bfType; 
    DWORD   bfSize; 
    WORD    bfReserved1; 
    WORD    bfReserved2; 
    DWORD   bfOffBits; 
    DWORD   biSize; # field from next (BITMAPINFOHEADER) structure
BITMAPFILEHEADER
our ($bmif, @bmiv)=read_C_struct(<<BMPINFOHEADER);
    LONG   biWidth; 
    LONG   biHeight; 
    WORD   biPlanes; 
    WORD   biBitCount; 
    DWORD  biCompression; 
    DWORD  biSizeImage; 
    LONG   biXPelsPerMeter; 
    LONG   biYPelsPerMeter; 
    DWORD  biClrUsed; 
    DWORD  biClrImportant; 
BMPINFOHEADER

# Combined fileheader and infoheader
# It is written 
our %bmh;

my $file=shift();
my $page=0;


binmode STDOUT;

# Read all lines into memory and make them equal width
# Probably, later it may make sense to make switch for not doing this (to save
# memory on preformatted inputs).
@::lines=();
while(<>) {
  chomp;
  # tr/.*/01/;
  eval "tr/$TR_FROM/$TR_TO/";
  if( "\f" eq $_ ) {
    if( "" eq $file ) {
      warn "Since no filename template is specified, all pages after first are ignored";
      last;
    }
    $page+=!$page; # start multipage output
    process_lines();
    $page++;
  } else {
    $::width=length if length>$::width;
    push @::lines, $_;
  }
}
if( 0<@::lines ) { process_lines() }

# write @::lines to .bmp file;
sub process_lines
{
  if( "" ne $file ) {
    close STDOUT;
    $_=$file;
    $_.=".bmp" unless /\.[^\\\/]*$/;
    s/(?=\.[^\\\/]*$)/%04d/ unless /%./ || 0==$page;
    $_=sprintf $_,$page if /%./;
    open STDOUT, ">$_";
  }
  binmode STDOUT;
  my $pal;
  my $clrs;
  {
  my @t=split',',$opt_p;
  $pal=join'',map{pack('H*',substr($_.'00000000',0,8))} @t;
  $clrs=@t;
  }
  %bmh=(
    bfType=>unpack('v','BM'),
    bfReserved1=>0,
    bfReserved2=>0,
    bfOffBits=>40+14+4*$clrs, # file header+info header + palette
    biSize=>40,
  
    biPlanes=>1,
    biBitCount=>$opt_b,
    biCompression=>0,
    biXPelsPerMeter=>300/0.0254, # assuming 300 dpi
    biYPelsPerMeter=>300/0.0254,
    biClrUsed=>int(length($pal)/4),
    biClrImportant=>0,
    biWidth=>$::width,
    biHeight=>scalar @::lines,
  );
  my $scansize;
  my $frm;
  if( 1==$opt_b ) {
    $scansize=$::bmh{biWidth}/32;
    $scansize=1+int($scansize) if $scansize!=int($scansize); # in dwords
    $scansize*=4; # in bytes
    $frm="B".(8*$scansize);
  } elsif( 4==$opt_b ) {
    $scansize=$::bmh{biWidth}/8;
    $scansize=1+int($scansize) if $scansize!=int($scansize); # in dwords
    $scansize*=4; # in bytes
    $frm="H".(2*$scansize);
  } else {
    die "Unsupported bits: $opt_b";
  }
  $::bmh{biSizeImage}=$::bmh{biHeight}*$scansize; 
  $::bmh{bfSize}=$::bmh{bfOffBits}+$::bmh{biSizeImage};

  print pack($::bmff.$::bmif,@::bmh{@::bmfv,@::bmiv});
  warn "format $frm";
  print $pal; # palette
  print join'', reverse map{pack$frm,$_.($opt_f x ($::width-length))}@::lines;
  # paint creates bmp with 0 means black and 1 means white. banner has
  # different convention
  #
  # clean buffer for the next bitmap
  @::lines=();
  $::width=0;
}

sub read_C_struct
{
  my ($format, @vars);
  for(split "\n",$_[0])
  {
    /(\w+)\s+(\w+)/;
    push @vars,$2;
    $format.=$::MSTYPE{uc($1)} || die "illegal type: $1";
  }
  return ($format, @vars);
}

--0-659259426-1190917489=:22433--