Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect 2003-09-24.

Similar presentations


Presentation on theme: "Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect 2003-09-24."— Presentation transcript:

1 Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect

2 Universal Character Encoding … Unique number for every character Unique number for every character

3 Unifies all Languages 96 thousand characters, so far 96 thousand characters, so far All characters accessible at the same time, in the same document: All characters accessible at the same time, in the same document: A, Ž, Ш, Δ, ش,,,,…,,,…,,, …..

4 Lingua Franca for Computers Developed & supported by industry leaders: Developed & supported by industry leaders: Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys, … Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys, … Required by modern standards: Required by modern standards: XML, HTML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, Perl, etc. XML, HTML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, Perl, etc. Implemented in: Implemented in: All modern operating systems, browsers, and other products All modern operating systems, browsers, and other products

5 International Domain Names Approved - Unicode-Based Approved - Unicode-Based Examples: Examples:

6 Standard Resources Online Standard Online Standard Technical Reports Technical Reports FAQs FAQs General Information General Information Discussion Forums, Conferences Discussion Forums, Conferences

7 Programming Resources System APIs: System APIs: Windows, Java, Unix, Oracle, DB2, Sybase, Mac, Linux, … Windows, Java, Unix, Oracle, DB2, Sybase, Mac, Linux, … Languages Languages Java, JavaScript, C#, Perl 5.6.0, C, C++, SQL, … Java, JavaScript, C#, Perl 5.6.0, C, C++, SQL, … Cross-platform libraries: Cross-platform libraries: ICU, Rosette, … ICU, Rosette, … ICU

8 Stability Developers / other standards need absolute stability Developers / other standards need absolute stability Characters are never moved or deleted Characters are never moved or deleted Ordering of characters is by collation, not binary order. See UTS #10: Unicode Collation Algorithm Ordering of characters is by collation, not binary order. See UTS #10: Unicode Collation AlgorithmUTS #10: Unicode Collation AlgorithmUTS #10: Unicode Collation Algorithm Characters may be deprecated (discouraged). Characters may be deprecated (discouraged). Characters never change names Characters never change names Annotations are used to clarify usage Annotations are used to clarify usage See Unicode Policies See Unicode PoliciesUnicode PoliciesUnicode Policies

9 Indic Support in Unicode ISCII the basis for characters and allocation ISCII the basis for characters and allocation Consortium actively engaged with Indian Government, which is a member Consortium actively engaged with Indian Government, which is a member Welcomes addition of missing characters (e.g. Vedic), clarifications or corrections of usage Welcomes addition of missing characters (e.g. Vedic), clarifications or corrections of usage

10 Structural Similarities with ISCII Within script, layout and contents nearly identical Within script, layout and contents nearly identical Independent + dependent vowels Independent + dependent vowels Halant model for representing conjuncts Halant model for representing conjuncts conjuncts / half-forms not directly encoded conjuncts / half-forms not directly encoded represented by sequences instead represented by sequences instead Phonetic sequence – order in syllables Phonetic sequence – order in syllables

11 Structural Differences with ISCII Unicode is stateless: Unicode is stateless: No shifting to get different scripts No shifting to get different scripts Each character has a unique number Each character has a unique number Unicode is uniform: Unicode is uniform: No extension bytes necessary No extension bytes necessary All characters coded in the same space All characters coded in the same space

12 Additional Characters Indian Government is developing proposals for: Indian Government is developing proposals for: Additions of missing characters: Additions of missing characters: Vedic Vedic Individual characters for certain scripts Individual characters for certain scripts Annotations and Descriptions Annotations and Descriptions

13 Global Applications now support languages of India Companies supporting Indic with Unicode Companies supporting Indic with Unicode OpenType fonts OpenType fonts Font support for Indic Font support for Indic Microsoft Windows Microsoft Windows Java (IBM contributed ICU Indic Layout) Java (IBM contributed ICU Indic Layout) Linux Linux …

14 Benefits for India All documents, anywhere in the world, can have Indic text All documents, anywhere in the world, can have Indic text Allows seamless multilingual documents in India Allows seamless multilingual documents in India including scriptures and minority languages including scriptures and minority languages Opens up software export market, beyond English Opens up software export market, beyond English Connects India to the world Connects India to the world

15 How India Can Contribute Effective Communication with the Unicode Consortium Effective Communication with the Unicode Consortium Provide Resources for Development Provide Resources for Development Descriptions of Usage Descriptions of Usage Descriptions of Character Shaping Descriptions of Character Shaping Transliteration Tables from Script to Script Transliteration Tables from Script to Script Collation Information Collation Information OpenType fonts OpenType fonts …

16 What Developers Can Do Interwork with existing ISCII systems Interwork with existing ISCII systems Move to Unicode for future developments Move to Unicode for future developments Java, Windows, Linux, … Java, Windows, Linux, …

17 The Future The world is moving rapidly to Unicode The world is moving rapidly to Unicode Unicode makes India open to the world Unicode makes India open to the world The world comes to you, and The world comes to you, and You go to the world You go to the world You can help You can help

18 Q & A

19 Backup Slides

20 Multiple Forms UTF-8: maximal compatibility with 8-bit systems UTF-8: maximal compatibility with 8-bit systems UTF-16: good storage, interoperability with Windows/Java UTF-16: good storage, interoperability with Windows/Java UTF-32: simplest processing UTF-32: simplest processing Fast, lossless conversion Fast, lossless conversion See Forms of Unicode See Forms of UnicodeForms of UnicodeForms of Unicode


Download ppt "Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect 2003-09-24."

Similar presentations


Ads by Google