From SRS0=XnMxyl72=OX=yahoo.com=andrewnikitin@taz.de Thu Sep 27 21:58:29 2007 Delivered-To: czyborra@gmail.com Received: by 10.142.76.15 with SMTP id y15cs63515wfa; Thu, 27 Sep 2007 11:25:22 -0700 (PDT) Received: by 10.66.219.11 with SMTP id r11mr3931780ugg.1190917520701; Thu, 27 Sep 2007 11:25:20 -0700 (PDT) Return-Path: Received: from fmmailgate03.web.de (fmmailgate03.web.de [217.72.192.234]) by mx.google.com with ESMTP id 29si2382550uga.2007.09.27.11.25.19; Thu, 27 Sep 2007 11:25:20 -0700 (PDT) Received-SPF: neutral (google.com: 217.72.192.234 is neither permitted nor denied by best guess record for domain of SRS0=XnMxyl72=OX=yahoo.com=andrewnikitin@taz.de) client-ip=217.72.192.234; DomainKey-Status: bad (test mode) Authentication-Results: mx.google.com; spf=neutral smtp.mail=SRS0=XnMxyl72=OX=yahoo.com=andrewnikitin@taz.de; domainkeys=hardfail (test mode) header.From=andrewnikitin@yahoo.com Received: from mx26.web.de (mx26.dlan.cinetic.de [172.20.5.104]) by fmmailgate03.web.de (Postfix) with ESMTP id 972259EA76C1 for ; Thu, 27 Sep 2007 20:25:19 +0200 (CEST) Received: from [194.29.227.41] (helo=ghostwriter.taz.de) by mx26.web.de with esmtp (WEB.DE 4.107 #114) id 1Iay35-0005wd-00 for plvd.org@web.de; Thu, 27 Sep 2007 20:25:19 +0200 Received: from jupiter.hal.taz.de (jupiter.hal.taz.de [10.1.0.113]) by ghostwriter.taz.de (8.13.8/8.13.8/Debian-3) with ESMTP id l8RIOCx5012217 for ; Thu, 27 Sep 2007 20:24:13 +0200 Received: from spambuster.taz.de (osiris.hal.taz.de [10.1.0.4]) by jupiter.hal.taz.de (8.13.6/8.13.6) with ESMTP id l8RIPFnx013788 for ; Thu, 27 Sep 2007 20:25:15 +0200 Received: from web60913.mail.yahoo.com (web60913.mail.yahoo.com [209.73.179.2]) by spambuster.taz.de (8.13.6/8.13.6) with SMTP id l8RIOts8006210 for ; Thu, 27 Sep 2007 20:24:56 +0200 Received: (qmail 23309 invoked by uid 60001); 27 Sep 2007 18:24:49 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=nilzHOVvURlBgn58yYgC1twUb/zSDPCV9/GdfldKwrJAiC8gFJtTJicvZLNGrV8umyOx6XNFPm7RULzrsJ2WgCwsP4bF9T7x0+7XWwmbKn0Lok3Y11facZUNicugmSLQS4m5LwThFToL5mctI1eHuNKgMmImdJIFcP4kNZM+NH4=; X-YMail-OSG: jm6_buMVM1kjm8MWURf33u_raUKF63vPNBz2lQ2dp9RsVdEEmP5yNiaJ_XyJnMRmrxWL_fnUCGG2rei7uN26jSMrplTGidB6t4oI_X3m7c0XF0ag5M8rTbDDHcqGYK0- Received: from [198.208.251.22] by web60913.mail.yahoo.com via HTTP; Thu, 27 Sep 2007 11:24:49 PDT Date: Thu, 27 Sep 2007 11:24:49 -0700 (PDT) From: Andrew Nikitin Subject: Re: unifont.png To: roman_czyborra In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0-659259426-1190917489=:22433" Message-ID: <453595.22433.qm@web60913.mail.yahoo.com> X-Scanned-By: MIMEDefang 2.57 on 10.1.0.141 X-Scanned-By: MIMEDefang 2.52 on 194.29.227.46 X-WEBDE-FORWARD: plvd.org@web.de -> czyborra@googlemail.com X-IMAPbase: 1190923159 1 Status: RO X-Status: A X-Keywords: X-UID: 1 --0-659259426-1190917489=:22433 Content-Type: text/plain; charset=iso-8859-1 Content-Id: Content-Disposition: inline Content-Transfer-Encoding: 8bit Hello, Roman. I am glad my small contribution turned out useful. I created the sample with perl sample.pl unifont.hex | perl banner.pl -iutf16 -w 640 | perl ..\g\bmp.pl z I then converted resulting z.bmp (or z0001.bmp) into png with some conversion tool, not sure which one. (scripts are attached) sample.pl is trivial, banner.pl can read unicode text in various forms and output it in various form, including "banner" format -- rows of ascii characters '1' and '0' to denote black and white pixels. This banner format is converted to bmp with bmp.pl which reads banner from input. As i mentioned already, sample.pl is trivial, so I am also including usample.pl, which I am slightly more proud of. You may find it useful for the purpose of visualising implemented unicode ranges. It generates html, which is later converted into banner then to bmp. perl usample.pl unifont.hex | perl banner.pl -ihtml | perl ..\g\bmp.pl z --- roman_czyborra wrote: > Dear Andrew, thank you for creating the unifont.png - I have just > discovered it and installed it as > http://czyborra.com/unifont/unifont.png and as > http://commons.wikimedia.org/wiki/GNU_unifont.png for > http://en.wikipedia.org/wiki/Unifont - looks beautiful! Do you still > have the script to make the PNG for us to share? Cheers: Roman > > ____________________________________________________________________________________ Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. http://smallbusiness.yahoo.com/webhosting --0-659259426-1190917489=:22433 Content-Type: text/plain; name="banner.pl" Content-Description: 831548463-banner.pl Content-Disposition: inline; filename="banner.pl" Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by ghostwriter.taz.de id l8RIOCx5012217 =3Dhead1 NAME uniconv.pl, banner.pl -- convert unicode text into different presentation= s =3Dhead1 SYNOPSIS perl banner.pl {options} -- 1 byte per character according to codepage in a given fi= le. Format of codepage file is like in http://czyborra.com. if is just a number then codepage file used is "./cp/cp.txt". html: -- ascii and iso-8859-1 characters are transmitted as is, characters with numerical value bigger that 255 are transmitted in &#; form. HTML entities are allowed on input; sets codepage for plain text if specified. =3Ditem -o output_format Parameter is keyword, specifying output format. In addition to the ones l= isted as input formats, may be one of the following: banner -- every symbol from the input is represented as big text matrix; see -f and -w options pcx -- graphic in pcx format (pipes 'banner' output to external pcx conversion perl script) rtf -- Microsoft rich text format, Courier New 10 pt. font. Please,=20 note, that Wordpad is capable of displaying and printing unicode characte= rs, but is not capable of saving them. =3Ditem -f font_file_names Font filename with unicode glyphs. Should be in .hex format. Default is 'unifont.hex' (see http://czyborra.com/unifont). It is possible to several files here. File names should be comma separated and no space is allowed around comma. All fonts must be of the same size = and first file has to have fontheight directive on the first line. Glyph definitions found in latter files override those found in earlier f= iles. =3Ditem -w width Specifies maximum possible width (in characters or pixel, depending on th= e output type) of the output file. =3Ditem -h height Specifies maximum possible height in pixels for 'gif' output type. When t= his heigh is exceeded new page starts. =3Dback =3Dhead1 HISTORY =3Dover 4 =3Ditem 2001-04-20 html input mode =3Ditem 2001-07-05 codepage input/output modes =3Ditem 2001-07-06 possibility to use fonts of arbitrary heigth and width. Font height is specified in .hex file as ':FONTHEIGHT=3Dddd'. Default -- 16 (for compatibility with unifont.hex). Also added shortcuts to codepage files: if only a number . =3Ditem 2001-07-10 pipe output to ../g/pcx.pl script for 'pcx' output mode =3Ditem 2001-07-13 prints U+FFFD (REPLACEMENT CHARACTER) instead of symbols not found in the= font =3Ditem 2001-07-31 restructured source into modular form (allows -o pcx:fontname or -o banner:fontname among other things);=20 :v :n modifiers for utf16 -- explicitly set byte order and put BOM in the beginning of output; ! a lot of changes, watch for bugs; =3Ditem 2001-08-01 u::html::input recognizes html entities (μ < etc.) =3Ditem 2001-08-21 fixed misspelled reference to base class u::abstract in u::banner; :fontheight need not start from the first column any longer; =3Ditem 2001-09-13 search for unifont.hex, codepages and pcx.pl relatively to script locatio= n, not current directory; codepage in u::html; =3Ditem 2001-11-06 cp name substitution is done in 2 stages. First it is completed with cp a= nd .txt (if consists only from digits) and then path is added (if no path originally specified) =3Ditem 2001-11-09 -h (page height) switch is actually used during banner output. "\f\n" is inserted after every $opt_h lines of output. fixed problem with passing empty codepage into html decoder =3Ditem 2002-01-07 in cp (html) input mode if character is not found in codepage table, it is processed as is (instead of fffd before) =3Ditem 2002-05-20 if path is not specified in -f option then defaults it to $0/../ =3Ditem 2002-05-24 allow many-to many relationship in codepages. First found matching pair provides substitution. =3Ditem 2002-06-11 process ^L in the input files for banner output type. watch for bugs for pcx output type. =3Ditem 2004-06-30 rtf output =3Ditem 2004-07-19 do not display zero width non-breakable space (U+FEFF) in graphic output =3Ditem 2004-07-22 prefix \,{,} in rtf output with a backslash =3Ditem 2004-08-12 allow multiple comma separated fontnames in -f option =3Dback =3Dhead1 TODO u::utf7::output u::utf75::input phonetic substitutions in u::ascii::output =3Dcut use strict; package u::abstract ; sub new { bless {}, shift(); } ############# Input modes ######################## # Input routines accept next character from the stream, store it=20 # in the buffer and when buffer contains enough information to produce # next few unicode characters, these characters are being passed to=20 # output functions (via global ::printwchar()). sub input_init{}; sub input { die "->input is not implemented for ",ref(shift)} # After there is no more characters in the imput ->input_flush() is calle= d # to flush all nonempty input buffers. sub input_flush{} ############# Output modes ######################## # ->output method accepts integer and uses 'print' for output. sub output_init {} sub output {die "->output is not implemented for ",ref(shift)} sub output_flush{} package u::ascii; @u::ascii::ISA=3Dqw(u::abstract); sub input{ &::printwchar(ord($_[1])) if '' ne $_[1]; } ############# Codepage encoded input/output ############# package u::cp; @u::cp::ISA=3Dqw(u::abstract); sub new { warn "u::cp::new @_" if $::opt_d; my $class=3Dshift; my $fn=3Dshift; # optional parameter -- cp filename my $cp=3D{}; ::read_cp($cp,$fn) if '' ne $fn; bless $cp, $class; } sub input { my $this=3Dshift(); my $i=3Dord($_[0]); my $u;=20 $u=3D$this->{"c$i"}; if( '' eq $u ) { $u=3D$i; } else { $u=3D0xFFFD if '' eq $u; # Unicode REPLACEMENT CHARACTER=20 }=20 ::printwchar($u); } sub output { my $this=3Dshift(); my $code=3Dshift(); my $c=3D$this->{"u$code"}; if( '' eq $c ) { if( 128>$code ) { print chr($code); } else { print '?'; } } else { print chr($c); } } package u::html;####################################################### @u::html::ISA=3Dqw(u::cp); %u::html::ENTITY =3D split /\s+/,<{i_buf}=3D''; bless $this, $class; } sub input_flush { my $this=3Dshift(); warn "unfinished html sequence: $this->{i_buf}" if length $this->{i_buf= }; } sub input { my $this=3Dshift(); my $pb=3D\($this->{i_buf}); if( '' eq $$pb ) { if( '&' eq $_[0] ){$$pb=3D'&'} else {=20 # ::printwchar(ord($_[0])) $this->SUPER::input(@_); } } else { unless (';' eq $_[0] ) { $$pb.=3D$_[0]; } else { if(0){ } elsif( '&#x' eq substr($$pb,0,3) ) { ::printwchar(hex(substr($$pb,3))); } elsif( '&#' eq substr($$pb,0,2) ) { ::printwchar(0+substr($$pb,2)); } elsif( defined $u::html::ENTITY{substr($$pb,1)} ) { ::printwchar($u::html::ENTITY{substr($$pb,1)}); } else { for(my $i=3D0; $iSUPER::input(ord(substr($$pb,$i,1))); } # ::printwchar(ord($_[0])); $this->SUPER::input(ord($_[0])); warn "Bad html_i buffer: $$pb"; } $$pb=3D''; } } } sub output() { my $this=3Dshift(); my $code=3Dshift(); print((256>$code)?chr($code):"&#"."$code;"); } package u::utf7 ;###################################################### @u::utf7::ISA=3Dqw(u::abstract); $u::utf7::base64=3D"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0= 123456789+/"; sub new { bless {buf=3D>0, shift=3D>-1}, shift(); } sub input { my $this=3Dshift(); my $c=3D$_[0]; if(-1=3D=3D$this->{shift} ) { if('+' eq $c ) { $this->{shift}=3D0; return; } ::printwchar( ord($c)); } elsif ( 0=3D=3D$this->{shift} && '-' eq $c ) { # if first '+' is followed by anything other that '-' or base64 symbol # this is an ill-formed sequence $this->{shift}=3D-1; ::printwchar( ord('+')); return; } else { my $v=3Dindex($u::utf7::base64,$c); if( 0<=3D$v) { # print "[",substr(unpack("B8",pack("c",$v)),2),"]"; if(16<=3D6+$this->{shift}) { $this->{buf}=3D($this->{buf} << 16-$this->{shift}) | ($v >> $this= ->{shift}-10); ::printwchar( $this->{buf} ); $this->{shift}=3D$this->{shift}-10; $this->{buf}=3D$v & ~(-1 << $this->{shift}); #print "{",unpack("H2",pack("c",$this->{buf})),"}"; } else { $this->{buf}=3D($this->{buf}<<6) | ($v & 63); $this->{shift}+=3D6; } } else { if($this->{buf}!=3D0) {print"";} $this->{shift}=3D-1; # print ":"; if('-' ne $c) { ::printwchar( ord($c) )} } } } package u::utf8;####################################################### @u::utf8::ISA=3Dqw(u::abstract); # utf8: #bytes | bits | representation # 1 | 7 | 0vvvvvvv # 2 | 11 | 110vvvvv 10vvvvvv # 3 | 16 | 1110vvvv 10vvvvvv 10vvvvvv # 4 | 21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv # ?? # 5 | 26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv # 6 | 31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv # 7 | 36 | 11111110 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10v= vvvvv=20 sub new { bless {i_cnt=3D>0, i_buf=3D>0,}, shift(); } sub input_flush { my $this=3Dshift(); warn "unfinished utf8 input sequence" if 0<$this->{i_cnt} ; } sub input { my $this=3Dshift(); my $code=3Dord($_[0]); if( 0=3D=3D$this->{i_cnt} ) { # print "<$code,$this->{i_cnt},$this->{i_buf}>"; # print 0xC0=3D=3D($code & 0xE0),",",0xC0,",",$code & 0xE0;; START_SEQUENCE: if( 0=3D=3D($code & 0x80) ) { ::printwchar($code) } elsif( 0x80=3D=3D($code & 0xC0) ) { warn "utf8 continuation without h= eader on input:$code" } # ignore illegal code elsif( 0xC0=3D=3D($code & 0xE0) ) {$this->{i_cnt}=3D1; $this->{i_buf}= =3D$code & 0x1F; } elsif( 0xE0=3D=3D($code & 0xF0) ) {$this->{i_cnt}=3D2; $this->{i_buf}= =3D$code & 0x0F; } elsif( 0xF0=3D=3D($code & 0xF8) ) {$this->{i_cnt}=3D3; $this->{i_buf}= =3D$code & 0x07; } elsif( 0xF8=3D=3D($code & 0xFC) ) {$this->{i_cnt}=3D4; $this->{i_buf}= =3D$code & 0x03; } elsif( 0xFC=3D=3D($code & 0xFE) ) {$this->{i_cnt}=3D5; $this->{i_buf}= =3D$code & 0x01; } elsif( 0xFE=3D=3D$code || 0xFF=3D=3D$code) {warn"illegal utf-8 input = code:$code"} } else { if( 0x80=3D=3D($code & 0xC0) ) { $this->{i_buf}=3D($this->{i_buf} << 6) | ($code & 0x3F); ::printwchar($this->{i_buf}) if 0=3D=3D--($this->{i_cnt}) ; } else { warn "no continuation of utf8 input sequence: $code"; $this->{i_cnt}=3D0; goto START_SEQUENCE; } } } sub output { my $this=3Dshift(); my $code=3Dshift(); if ($code < 0x80) { print chr($code); } elsif ($code < 0x800) { print pack('cc',0xC0 | ($code>>6), 0x80 | ($code & 0x3F)); } elsif ($code < 0x10000) { print pack('ccc', 0xE0|($code>>12), 0x80|($code>>6)&0x3F, 0x80|$code&= 0x3F); } elsif ($code < 0x200000) { print pack('cccc', 0xF0 | ($code>>18), 0x80 | ($code>>12) & 0x3F, 0x8= 0 | ($code>>6) & 0x3F, 0x80 | $code & 0x3F); } } package u::utf16;###################################################### @u::utf16::ISA=3Dqw(u::abstract); sub new { my $class=3Dshift(); my $par=3Dshift(); my $this=3D{buf=3D>'', bo=3D>'',}; if( 'v' eq $par || 'n' eq $par ) { $this->{bo}=3D$par } warn "Byteorder=3D'$this->{bo}'\n" if $::opt_d; bless $this, $class; } sub input{ my $this=3Dshift(); print "[",ord($_[0]),"]" if $::opt_d; $this->{buf} .=3D $_[0]; return unless 2=3D=3Dlength($this->{buf}); # TODO: process surrogate pairs # N=3D(H-0xD800) * 0x400 + (L-0xDC00) + 0x10000; my $w; # print unpack("H*",$this->{buf}),$this->{bo},unpack($this->{bo},$this->= {buf}) if $::opt_d; if( '' eq $this->{bo}) { $w=3Dunpack('n',$this->{buf}); $this->{bo}=3D0xFFFE=3D=3D$w?'v':'n'; $this->{buf}=3D''; return if 0xFFFE=3D=3D$w || 0xFEFF=3D=3D$w; } else { $w=3Dunpack($this->{bo},$this->{buf}); } ::printwchar($w); $this->{buf}=3D''; } sub output { my $this=3Dshift(); my $code=3Dshift(); if( 0xFEFF!=3D$code && 0=3D=3D$this->{out_cnt} && '' ne $this->{bo}) { print pack($this->{bo}, 0xFEFF); } $this->{bo}=3D'n' if '' eq $this->{bo}; if( $code>0xFFFF ) { print pack($this->{bo} x 2, (0xD7C0+($code>>10)), (0xDC00| $code & 0x= 3FF)); } else { print pack($this->{bo},$code); } $this->{out_cnt}++; } ############# Output-only modes ######################## package u::utf75;###################################################### @u::utf75::ISA=3Dqw(u::abstract); #bytes | bits | representation # 1 | 7 | 0vvvvvvv # 2 | 10 | 1010vvvv 11vvvvvv # 3 | 16 | 1011vvvv 11vvvvvv 11vvvvvv sub output { my $this=3Dshift(); my $code=3Dshift(); if ($code < 0x80) { print chr($code); } elsif ($code < 0x400) { print pack('cc',0xA0 | ($code>>6), 0xC0 | ($code & 0x3F)); } elsif ($code < 0x10000) { print pack('ccc', 0xB0|($code>>12), 0xC0|($code>>6)&0x3F, 0xC0|$code&= 0x3F); } elsif ($code < 0x110000) { ::printwchar(0xD7C0 + ($code>>10)); ::printwchar(0xDC00 + ($code & 0x3FF)); } } # TODO: # Special characters that may require attention while rendering # 3.9 Special Character Properties Conformance=20 # Copyright =A9 1991=AD2000 by Unicode, Inc. The Unicode Standard=20 # . Line boundary control=20 # 0009 HORIZONTAL TAB=20 # 000A LINE FEED=20 # 000C FORM FEED=20 # 000D CARRIAGE RETURN=20 # 0020 SPACE=20 # 00A0 NO=ADBREAK SPACE=20 # 0F0B TIBETAN MARK INTERSYLLABIC TSHEG=20 # 0F0C TIBETAN MARK DELIMITER TSHEG BSTAR=20 # 2000 EN QUAD=20 # 2002 EN SPACE=20 # 2003 EM SPACE=20 # 2004 THREE=ADPER=ADEM SPACE=20 # 2005 FOUR=ADPER=ADEM SPACE=20 # 2006 SIX=ADPER=ADEM SPACE=20 # 2007 FIGURE SPACE=20 # 2008 PUNCTUATION SPACE=20 # 2009 THIN SPACE=20 # 200A HAIR SPACE=20 # 200B ZERO WIDTH SPACE=20 # 2011 NON=ADBREAKING HYPHEN=20 # 2028 LINE SEPARATOR=20 # 2029 PARAGRAPH SEPARATOR=20 # 202F NARROW NO=ADBREAK SPACE=20 # FEFF ZERO WIDTH NO=ADBREAK SPACE=20 # . Hyphenation control=20 # 002D HYPHEN=ADMINUS=20 # 00AD SOFT HYPHEN=20 # 058A ARMENIAN HYPHEN=20 # 1806 MONGOLIAN TODO SOFT HYPHEN=20 # 2010 HYPHEN=20 # 2011 NON=ADBREAKING HYPHEN=20 # 2027 HYPHENATION POINT=20 # . Fraction formatting=20 # 2044 FRACTION SLASH=20 # . Special behavior with nonspacing marks=20 # 0020 SPACE=20 # 0069 LATIN SMALL LETTER I=20 # 006A LATIN SMALL LETTER J=20 # 00A0 NO=ADBREAK SPACE=20 # 0131 LATIN SMALL LETTER DOTLESS I=20 # . Double nonspacing marks=20 # 0360 COMBINING DOUBLE TILDE=20 # 0361 COMBINING DOUBLE INVERTED BREVE=20 # 0362 COMBINING DOUBLE RIGHTWARDS ARROW BELOW=20 # . Joining=20 # 200C ZERO WIDTH NON=ADJOINER=20 # 200D ZERO WIDTH JOINER=20 # . Bidirectional ordering=20 # 200E LEFT=ADTO=ADRIGHT MARK=20 # 200F RIGHT=ADTO=ADLEFT MARK=20 # 202A LEFT=ADTO=ADRIGHT EMBEDDING=20 # 202B RIGHT=ADTO=ADLEFT EMBEDDING=20 # 202C POP DIRECTIONAL FORMATTING=20 # 202D LEFT=ADTO=ADRIGHT OVERRIDE=20 # 202E RIGHT=ADTO=ADLEFT OVERRIDE=20 # . Alternate formatting=20 # 206A INHIBIT SYMMETRIC SWAPPING=20 # 206B ACTIVATE SYMMETRIC SWAPPING=20 # 206C INHIBIT ARABIC FORM SHAPING=20 # 206D ACTIVATE ARABIC FORM SHAPING=20 # 206E NATIONAL DIGIT SHAPES=20 # 206F NOMINAL DIGIT SHAPES=20 # . Syriac abbreviation=20 # 070F SYRIAC ABBREVIATION MARK=20 # . Indic dead=ADcharacter formation=20 # 094D DEVANAGARI SIGN VIRAMA=20 # 09CD BENGALI SIGN VIRAMA=20 # 0A4D GURMUKHI SIGN VIRAMA=20 # 0ACD GUJARATI SIGN VIRAMA=20 # 0B4D ORIYA SIGN VIRAMA=20 # 0BCD TAMIL SIGN VIRAMA=20 # 0C4D TELUGU SIGN VIRAMA=20 # 0CCD KANNADA SIGN VIRAMA=20 # 0D4D MALAYALAM SIGN VIRAMA=20 # 0DCA SINHALA SIGN AL=ADLAKUNA=20 # 0F84 TIBETAN SIGN HALANTA=20 # 1039 MYANMAR SIGN VIRAMA=20 # 17D2 KHMER SIGN COENG=20 # . Mongolian variant selectors=20 # 180B MONGOLIAN FREE VARIATION SELECTOR ONE=20 # 180C MONGOLIAN FREE VARIATION SELECTOR TWO=20 # 180D MONGOLIAN FREE VARIATION SELECTOR THREE=20 # 180E MONGOLIAN VOWEL SEPARATOR=20 # . Ideographic variation indication=20 # 303E IDEOGRAPHIC VARIATION INDICATOR=20 # . Ideographic description=20 # 2FF0 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT=20 # 2FF1 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO BELOW=20 # 2FF2 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT=20 # 2FF3 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO MIDDLE AND BELOW=20 # 2FF4 IDEOGRAPHIC DESCRIPTION CHARACTER FULL SUR=AD ROUND=20 # 2FF5 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM ABOVE=20 # 2FF6 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM BELOW=20 # 2FF7 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT=20 # 2FF8 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT=20 # 2FF9 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT=20 # 2FFA IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT=20 # 2FFB IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID=20 # . Interlinear annotation=20 # FFF9 INTERLINEAR ANNOTATION ANCHOR=20 # FFFA INTERLINEAR ANNOTATION SEPARATOR=20 # FFFB INTERLINEAR ANNOTATION TERMINATOR=20 # . Object replacement=20 # FFFC OBJECT REPLACEMENT CHARACTER=20 # . Code conversion fallback=20 # FFFD REPLACEMENT CHARACTER=20 # . Byte order signature=20 # FEFF ZERO WIDTH NO=ADBREAK SPACE=20 package u::banner;##################################################### @u::banner::ISA=3Dqw(u::abstract); sub new { my $class=3Dshift(); my %font; $font{height}=3D16; for my $fntnm(split',',shift || $::opt_f) { $fntnm =3D "$0/../$fntnm" unless $fntnm=3D~/[\/\\]/; for(open FONT,$fntnm or die "Cannot open: $fntnm";;) { $font{unpack("n",pack("H*",$1))}=3D$2 if/^([0-9a-fA-F]+):([0-9a-fA-= F]+)/; $font{height}=3D$1 if /^\s*:fontheight\s*=3D\s*(\d+)/i; } close FONT; } $font{line}=3D[]; $font{pageheight}=3D0; return bless \%font, $class; } sub output_flush { my $this=3Dshift(); $this->newline() if '' ne $this->{line}[0]; } sub newpage { my $this=3Dshift(); $this->output_flush(); print "\f\n"; } sub output() { my $this=3Dshift(); my $code=3Dshift(); # warn "{$code}"; if( 10=3D=3D$code ) {=20 $this->newline() if 12 !=3D $this->{lastchar} } if( 13=3D=3D$code || 0xFEFF=3D=3D$code ) {=20 return; } if( 12=3D=3D$code ) {=20 $this->newpage(); } elsif( 32>$code ) {} else { $code=3D0xFFFD unless defined $this->{$code}; # goto CONT unless defined $this->{$code}; $_=3Dsubstr(unpack('B*',pack('H*',$this->{$code})),0,4*length($this->= {$code})); my $cw=3Dint(length($_)/$this->{height}); my $i=3D0; for( /.{$cw}/g ) { if( $::opt_w && (length($this->{line}[$i]) + length($_))>$::opt_w) = { warn("not on first") if $i; $this->newline(); #print"---\n"; } $this->{line}[$i++] .=3D $_; } CONT: } $this->{lastchar}=3D$code; } sub newline() { my $this=3Dshift(); if( length($this->{line}[0])>0 && $::opt_h && $this->{pageheight} >=3D $= ::opt_h ) { $this->{pageheight}=3D0; print "\f\n"; } for(@{$this->{line}}) { print ; print '0' x ($::opt_w-length) if $::opt_w; print"\n"; } $this->{pageheight}++; $this->{line}=3D[(('')x$this->{height})]; } package u::rtf;######################################################## @u::rtf::ISA=3Dqw(u::abstract); sub output_init { print "{\\rtf1{\\fonttbl{\\f0 Courier New}}\\fs20\n{\\f= 0 "; } # {\f0 \u9554*\u9552*\u9572*\u9557*=20 sub output { my $this=3Dshift(); my $code=3Dshift(); if( 10=3D=3D$code ) {=20 print "\n\\par "; } elsif( 13=3D=3D$code ) {=20 return; } elsif( ord('\\')=3D=3D$code || ord('{')=3D=3D$code || ord('}')=3D=3D$= code ) {=20 print '\\'.chr($code); } elsif( 12=3D=3D$code ) {=20 print "\n\\page\n"; } elsif( 32>$code ) { # ignore other control characters } elsif( 128>$code ) { print chr($code); } else { print "\\u$code*"; } } sub newpage { my $this=3Dshift(); $this->output_flush(); print "\f\n"; } sub output_flush{ print "\\par }}\n"; } package u::pcx;######################################################## @u::pcx::ISA=3Dqw(u::banner u::abstract); sub output_init { open PCX, "| perl $0/../../g/pcx.pl"; binmode PCX; } sub output_flush { my $this=3Dshift(); $this->newline() if '' ne $this->{line}[0]; close PCX; } sub newline() { my $this=3Dshift(); if( length($this->{line}[0])>0 && $::opt_h && $this->{pageheight} >=3D $= ::opt_h ) { $this->{pageheight}=3D0; print PCX "\f\n"; } for(@{$this->{line}}) { print PCX $_; print PCX '0' x ($::opt_w-length) if $::opt_w; print PCX "\n"; } $this->{pageheight}++; $this->{line}=3D[(('')x$this->{height})]; } # TODO: package u::BIF # alchemy -B type output # colors? # ------------------------ Translation part ------------------------- package main; use Getopt::Std; our ( $opt_f, # font file name $opt_w, # output width $opt_h, # outptut height $opt_i, # input mode $opt_o, # output mode $opt_d, # debug $opt_n, # output file name ); getopts('w:h:f:i:o:dn:'); $opt_f ||=3D "$0/../unifont.hex"; $opt_f =3D "$0/../$opt_f" unless $opt_f=3D~/[\/\\]/; # $opt_h ||=3D 55; $opt_o ||=3D 'banner'; $opt_i ||=3D 'ascii'; my %utf=3D( html =3D>'u::html', banner=3D>'u::banner', img =3D>'u::banner', pcx =3D>'u::pcx', utf16 =3D>'u::utf16', utf7 =3D>'u::utf7', utf8 =3D>'u::utf8', utf75 =3D>'u::utf75', utf9 =3D>'u::utf75', ascii =3D>'u::ascii', cp =3D>'u::cp', latin1=3D>'u::ascii', iso88591=3D>'u::ascii', rtf =3D>'u::rtf', ); my $inp_utf=3Dutf($opt_i); my $out_utf=3Dutf($opt_o); $inp_utf->input_init(); $out_utf->output_init(); binmode STDIN; if( '' ne $opt_n ) { close STDOUT; open STDOUT, ">$opt_n"; } binmode STDOUT; while() { for( /[\0-\377]/g ) { $inp_utf->input($_) } } $inp_utf->input_flush(); $out_utf->output_flush(); sub printwchar { die "obsolete call, use output_flush" if '' eq $_[0]; $out_utf->output($_[0]); } ############# Service functions ################## sub utf { $_=3Dshift(); /^([^:]*)/; my $enc=3D$1; my $par=3Dsubstr($',1); die "Unknown utf: $enc" unless defined $utf{$enc}; my $class=3D$utf{$enc}; warn "utf=3Dnew $class '$par'\n" if $opt_d; return new $class $par; } sub read_cp { # Codepage file consists of lines of form #=3D20 U+0020 SPACE #=3D21 U+0021 EXCLAMATION MARK #=3D22 U+0022 QUOTATION MARK my $cp=3Dshift; # hash reference my $fn=3Dshift; # filename warn "cp=3D$cp, fn=3D$fn" if $opt_d; $fn=3D"cp$fn" if $fn=3D~/^\d*$/; # shortcut to codepage $fn=3D"$fn.txt" unless $fn=3D~/\./; # shortcut to codepage $fn=3D"$0/../cp/$fn" unless $fn=3D~m{[/\\]}; # shortcut to codepage for( open CP, $fn or die "Cannot open $fn"; ; ) { if( /^\s*=3D([[:xdigit:]]{2})\s+U\+([[:xdigit:]]{4,8})/ ) { my $c2=3Dunpack('C',pack('H*',substr('00'.$1,-2,2))); my $c8=3Dunpack('N',pack('H*',substr(('0' x 8).$2,-8,8))); # print "$c2<->$c8\n"; $cp->{"c$c2"}||=3D$c8; $cp->{"u$c8"}||=3D$c2; } } } --0-659259426-1190917489=:22433 Content-Type: text/plain; name="sample.pl" Content-Description: 208028102-sample.pl Content-Disposition: inline; filename="sample.pl" # prints every character from font file in UTF-16 binmode STDOUT; while(<>){ next unless /^([0-9A-Fa-f]{4}):/; print pack('H*',$1); } --0-659259426-1190917489=:22433 Content-Type: text/plain; name="usample.pl" Content-Description: 346627872-usample.pl Content-Disposition: inline; filename="usample.pl" # Generate pretty columnwise sample with section headers # 2002-05-31 # 2002-06-03 -- use Blocks.txt from unicode std # 2003-10-24 -- do not print ignored sections # 2004-05-05 -- optional height; do not break in the middle of the table # 2004-05-06 -- specify "absent" character; dynamicaly define "normal" width; # ignore duplicate entries; initialize @COLUMNS use strict; use Getopt::Std; # my $H=190; # TODO maximum allowed page height our( $opt_w, # number of columns per page. Each column is 3 characters + 4 for # column heading $opt_h, # maximal page height $opt_t, # include html header/footer $opt_a, # absent, 0xfffd by default ); getopts("w:h:ta:"); $opt_w||=16; $opt_a||=0xfffd; $opt_a=ord($opt_a) if( 1==length($opt_a) && $opt_a=~/\D/ ); my @COLUMNS; my $nc=0; # number of filled columns; my $pc=-1; # previous character my $cs=-1; # Current section my($ns,$nsname); ($ns,$nsname)=split ';',scalar until $ns=~/^([[:xdigit:]]+)/; $ns=$1; print "
\n" if $opt_t;
my $tc=0xffffff;
my $ignored_section=1;
my $ignore_next_section;
my $normalwidth=32;

my $l; # line on page
my $title=""; # title of current unicode section
my $lc=-1; # last characters (used to skip duplicate entries)
while(<>){
  next unless /^([[:xdigit:]]{4,8}):([[:xdigit:]]*)/;
  my $cc=hex('0x'.$1);
  next if $cc == $lc;
  $lc=$cc;
  $normalwidth=length($2) if 32==$cc;
  my $w=length($2)>$normalwidth;
  if($cc>=hex($ns)) {
    flush_cols() if $nc;
    while(1) {
      $ignored_section='-' eq substr($nsname,0,1);
      printlp( "" );
      printlp( "" );
      $title=sprintf "%06s. %s",$ns, $nsname;
      # $ln+=3;
      if($ignored_section) {
        checklp(5);
        printlp($title);
        printlp( "" );
        printlp( " ***");
        printlp( " *** Section is ignored");
        printlp( " ***");
      }
      $pc=-1+(0xFFFF0 & hex($ns));
 REREAD:
      ($ns,$nsname)=split /\s*;\s*/,scalar ;
      last if eof(DATA);
      goto REREAD unless $ns=~/^([[:xdigit:]]+)/;
      $ns=$1;
      last unless $cc>=hex($ns) ;
      if(!$ignored_section) {
        checklp(5);
        printlp($title);
        $title="";
        printlp( "" );
        printlp( " ***");
        printlp( " *** Empty section");
        printlp( " ***");
      }
    }
#   printf "pc=%06X, cc=%06X, ns=%06X\n",$pc, $cc, hex($ns);
  }
  if( !$ignored_section) {
    if( ""  ne $title) {
      checklp(2+17);
      printlp($title);
      $title="";
      printlp( "" );
    }
    for( my $c=$pc+1; $c<$cc; ++$c ) { add_to_col($c,$opt_a,'  '); }
    add_to_col($cc,$cc,$w?' ':'  ');
    $pc=$cc;
  } 
  last unless --$tc;
}
flush_cols() if $nc;

print "
\n" if $opt_t; # print line, while tracking line number sub printlp { if( $opt_h && $l>=$opt_h ) { $l=0; print "\f\n"; } if( "" ne $_[0] || 0!=$l ) { print $_[0],"\n"; ++$l; } } sub checklp { if( $opt_h && $l+$_[0]>=$opt_h ) { $l=0; print "\f\n"; } } sub add_to_col { my ($a,$c, $f)=@_; @COLUMNS=('- ',map {sprintf " %1X ",$_} (0..15)) if 0==@COLUMNS; unless( $a & 0x0f ) { if( $opt_w<=$nc ) { flush_cols() } $COLUMNS[0].=' '; if( (0==$nc || 0==(0x30 & $a)) && (' ' eq substr($COLUMNS[0],-4,4)) ) { substr($COLUMNS[0],-3,3)=sprintf "%03X",$a>>4; } ++$nc; } $COLUMNS[1+(15&$a)].=$f."&#$c;"; } sub flush_cols { checklp( scalar @COLUMNS); for( @COLUMNS ) { printlp( $_ ) } @COLUMNS=(); $nc=0; } __END__ # Blocks-3.2.0.txt # Correlated with Unicode 3.2 # Start Code..End Code; Block Name 0020..007F; Basic Latin 0080..00FF; Latin-1 Supplement 0100..017F; Latin Extended-A 0180..024F; Latin Extended-B 0250..02AF; IPA Extensions 02B0..02FF; Spacing Modifier Letters 0300..036F; Combining Diacritical Marks 0370..03FF; Greek and Coptic 0400..04FF; Cyrillic 0500..052F; Cyrillic Supplementary 0530..058F; Armenian 0590..05FF; Hebrew 0600..06FF; Arabic 0700..074F; Syriac 0780..07BF; Thaana 0900..097F; Devanagari 0980..09FF; Bengali 0A00..0A7F; Gurmukhi 0A80..0AFF; Gujarati 0B00..0B7F; Oriya 0B80..0BFF; Tamil 0C00..0C7F; Telugu 0C80..0CFF; Kannada 0D00..0D7F; Malayalam 0D80..0DFF; Sinhala 0E00..0E7F; Thai 0E80..0EFF; Lao 0F00..0FFF; Tibetan 1000..109F; Myanmar 10A0..10FF; Georgian 1100..11FF; Hangul Jamo 1200..137F; Ethiopic 13A0..13FF; Cherokee 1400..167F; Unified Canadian Aboriginal Syllabics 1680..169F; Ogham 16A0..16FF; Runic 1700..171F; Tagalog 1720..173F; Hanunoo 1740..175F; Buhid 1760..177F; Tagbanwa 1780..17FF; Khmer 1800..18AF; Mongolian 1E00..1EFF; Latin Extended Additional 1F00..1FFF; Greek Extended 2000..206F; General Punctuation 2070..209F; Superscripts and Subscripts 20A0..20CF; Currency Symbols 20D0..20FF; Combining Diacritical Marks for Symbols 2100..214F; Letterlike Symbols 2150..218F; Number Forms 2190..21FF; Arrows 2200..22FF; Mathematical Operators 2300..23FF; Miscellaneous Technical 2400..243F; Control Pictures 2440..245F; Optical Character Recognition 2460..24FF; Enclosed Alphanumerics 2500..257F; Box Drawing 2580..259F; Block Elements 25A0..25FF; Geometric Shapes 2600..26FF; Miscellaneous Symbols 2700..27BF; Dingbats 27C0..27EF; Miscellaneous Mathematical Symbols-A 27F0..27FF; Supplemental Arrows-A 2800..28FF; Braille Patterns 2900..297F; Supplemental Arrows-B 2980..29FF; Miscellaneous Mathematical Symbols-B 2A00..2AFF; Supplemental Mathematical Operators 2E80..2EFF; CJK Radicals Supplement 2F00..2FDF; Kangxi Radicals 2FF0..2FFF; Ideographic Description Characters 3000..303F; CJK Symbols and Punctuation 3040..309F; Hiragana 30A0..30FF; Katakana 3100..312F; Bopomofo 3130..318F; Hangul Compatibility Jamo 3190..319F; Kanbun 31A0..31BF; Bopomofo Extended 31F0..31FF; Katakana Phonetic Extensions 3200..32FF; Enclosed CJK Letters and Months 3300..33FF; CJK Compatibility 3400..4DBF; CJK Unified Ideographs Extension A 4E00..9FFF; -CJK Unified Ideographs A000..A48F; Yi Syllables A490..A4CF; Yi Radicals AC00..D7AF; -Hangul Syllables D800..DB7F; High Surrogates DB80..DBFF; High Private Use Surrogates DC00..DFFF; Low Surrogates E000..F8FF; Private Use Area F900..FAFF; CJK Compatibility Ideographs FB00..FB4F; Alphabetic Presentation Forms FB50..FDFF; Arabic Presentation Forms-A FE00..FE0F; Variation Selectors FE20..FE2F; Combining Half Marks FE30..FE4F; CJK Compatibility Forms FE50..FE6F; Small Form Variants FE70..FEFF; Arabic Presentation Forms-B FF00..FFEF; Halfwidth and Fullwidth Forms FFF0..FFFF; Specials 10300..1032F; Old Italic 10330..1034F; Gothic 10400..1044F; Deseret 1D000..1D0FF; Byzantine Musical Symbols 1D100..1D1FF; Musical Symbols 1D400..1D7FF; Mathematical Alphanumeric Symbols 20000..2A6DF; CJK Unified Ideographs Extension B 2F800..2FA1F; CJK Compatibility Ideographs Supplement E0000..E007F; Tags F0000..FFFFF; Supplementary Private Use Area-A 100000..10FFFF; Supplementary Private Use Area-B --0-659259426-1190917489=:22433 Content-Type: text/plain; name="bmp.pl" Content-Description: 2765860354-bmp.pl Content-Disposition: inline; filename="bmp.pl" # Convert text file with 1's and 0's into uncompressed .bmp file # Usage: # perl bmp.pl {filename_template | >file.bmp} '000000,ffffff', 4=>'000000,000080,008000,008080,800000,800080,808000,808080,c0c0c0,0000ff,00ff00,00ffff,ff0000,ff00ff,ffff00,ffffff', ); # ms paint bmps are usually coded 0 black, 1 white, while banner codes # 0 white, 1 black $opt_b||=1; if( 1==$opt_b ) { $opt_s||='1#*,0-. '; $opt_f||='1'; } elsif(4==$opt_b) { $opt_s||='1#*,%,2,3,4,5,6,7,8,9,a,b,c,d,e,f0-. '; $opt_f||='f'; } else { die "The only supported bit depths are 1 and 4"; } $opt_p||=$PAL_DEF{$opt_b}; if( '@' eq substr($opt_p,0,1) ) { die "reading palette from file is not supported"; $opt_p=substr($opt_p,1); warn "read palette (and symbols) from text file $opt_p" if $opt_d; # (?:symbolset:)?[[:xdigit:]]{,6}(?:,)? open FILE,$opt_p; close FILE; } my @symbols=split(',',$opt_s); our($TR_FROM,$TR_TO); { $TR_FROM=join '',@symbols; my $A='0123456789abcdef'; my $i=-1; $TR_TO=join'',map{$i++;substr($A,$i,1) x length}@symbols; $TR_FROM=~s/-/\\-/g; $TR_TO=~s/-/\\-/g; } our %MSTYPE=( # converts MS type into pack/unpack format character WORD=>'v', DWORD=>'V', LONG=>'V', ); # Create template for bitmap file and image headers # @bm?v -- list of structure's field names in correct order # $bm?f -- pack format string # For the purpose of this script they can be combined in one record. # They should be read separately since biSize determines the size of # the infoheader structure. our ($bmff, @bmfv)=read_C_struct(<) { chomp; # tr/.*/01/; eval "tr/$TR_FROM/$TR_TO/"; if( "\f" eq $_ ) { if( "" eq $file ) { warn "Since no filename template is specified, all pages after first are ignored"; last; } $page+=!$page; # start multipage output process_lines(); $page++; } else { $::width=length if length>$::width; push @::lines, $_; } } if( 0<@::lines ) { process_lines() } # write @::lines to .bmp file; sub process_lines { if( "" ne $file ) { close STDOUT; $_=$file; $_.=".bmp" unless /\.[^\\\/]*$/; s/(?=\.[^\\\/]*$)/%04d/ unless /%./ || 0==$page; $_=sprintf $_,$page if /%./; open STDOUT, ">$_"; } binmode STDOUT; my $pal; my $clrs; { my @t=split',',$opt_p; $pal=join'',map{pack('H*',substr($_.'00000000',0,8))} @t; $clrs=@t; } %bmh=( bfType=>unpack('v','BM'), bfReserved1=>0, bfReserved2=>0, bfOffBits=>40+14+4*$clrs, # file header+info header + palette biSize=>40, biPlanes=>1, biBitCount=>$opt_b, biCompression=>0, biXPelsPerMeter=>300/0.0254, # assuming 300 dpi biYPelsPerMeter=>300/0.0254, biClrUsed=>int(length($pal)/4), biClrImportant=>0, biWidth=>$::width, biHeight=>scalar @::lines, ); my $scansize; my $frm; if( 1==$opt_b ) { $scansize=$::bmh{biWidth}/32; $scansize=1+int($scansize) if $scansize!=int($scansize); # in dwords $scansize*=4; # in bytes $frm="B".(8*$scansize); } elsif( 4==$opt_b ) { $scansize=$::bmh{biWidth}/8; $scansize=1+int($scansize) if $scansize!=int($scansize); # in dwords $scansize*=4; # in bytes $frm="H".(2*$scansize); } else { die "Unsupported bits: $opt_b"; } $::bmh{biSizeImage}=$::bmh{biHeight}*$scansize; $::bmh{bfSize}=$::bmh{bfOffBits}+$::bmh{biSizeImage}; print pack($::bmff.$::bmif,@::bmh{@::bmfv,@::bmiv}); warn "format $frm"; print $pal; # palette print join'', reverse map{pack$frm,$_.($opt_f x ($::width-length))}@::lines; # paint creates bmp with 0 means black and 1 means white. banner has # different convention # # clean buffer for the next bitmap @::lines=(); $::width=0; } sub read_C_struct { my ($format, @vars); for(split "\n",$_[0]) { /(\w+)\s+(\w+)/; push @vars,$2; $format.=$::MSTYPE{uc($1)} || die "illegal type: $1"; } return ($format, @vars); } --0-659259426-1190917489=:22433--