Presentation is loading. Please wait.

Presentation is loading. Please wait.

Charset to UTF. Good Old Old Days Is there any other language but American ?? EBCDIC ASCII.

Similar presentations


Presentation on theme: "Charset to UTF. Good Old Old Days Is there any other language but American ?? EBCDIC ASCII."— Presentation transcript:

1 Charset to UTF

2 Good Old Old Days Is there any other language but American ?? EBCDIC ASCII

3 Good Old Days Ascii: 1-127 – latin 127-256 – French,Italian, German etc. or Greek or Hebrew or Russian etc.

4 Multibyte Japanese – SJIS, EUC Chinese – Big5, GB Korean

5 Babel’s Tower http://www.i18nguy.com/unicode/codepages.html#czyborra

6 Many Languages Hebrew Japanese Arabic In the same doc/line/screen

7 Unicode All Languages Each char – 2 bytes – 63000+ problem: Not string - wide char

8 UTF8 One to one with Unicode 1-3 regular chars Well defined algorithm

9 Hebrew to Unicode 05D0 60 HEBREW LETTER ALEF 05D1 61 HEBREW LETTER BET 05D2 62 HEBREW LETTER GIMEL 05D3 63 HEBREW LETTER DALET 05D4 64 HEBREW LETTER HE 05D5 65 HEBREW LETTER VAV 05D6 66 HEBREW LETTER ZAYIN 05D7 67 HEBREW LETTER HET 05D8 68 HEBREW LETTER TET 05D9 69 HEBREW LETTER YOD 05DA 6A HEBREW LETTER FINAL KAF 05DB 6B HEBREW LETTER KAF 05DC 6C HEBREW LETTER LAMED 05DD 6D HEBREW LETTER FINAL MEM 05DE 6E HEBREW LETTER MEM and likewise for each charset

10 Need for Conversion Existing Data New data: Editors work in specific charsets, not in utf/unicode

11 Brute Force Foreach org_char convert to utf

12 Perl way 1 use ENCODE; ($if, $of)=@ARGV; open my $in, "<:encoding(iso-8859-8)", $if; open my $out, ">:encoding(utf8)", $of; while( ) { print $out $_; } close $in;

13 Perl way 2 perl -MEncode -e '($if, $of)=@ARGV;open my $in, " :encoding(utf8)", $of;while( ){ print $out $_; }' infile outfile


Download ppt "Charset to UTF. Good Old Old Days Is there any other language but American ?? EBCDIC ASCII."

Similar presentations


Ads by Google