Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNICODE & Indic Scripts

Similar presentations


Presentation on theme: "UNICODE & Indic Scripts"— Presentation transcript:

1 UNICODE & Indic Scripts
Dr. Mukul K Sinha Expert Software Consultants Ltd., New Delhi

2 Indian Languages & Scripts
Indian Languages – 22 Constitutionally Recog. Scripts – 11 (+3 Mithilakshar / Olchiki / Maitie Mayank) Devnagari – 8, Bangala -3, Gurumukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam/ Roman/ Perso-Arabian – 3 (Sanskrit, Hindi, Marathi, Nepali, Maithili, Santhali, Bodo, Dogri) (Bangala, Assameses, Manipuri), (Urdu, Kashmiri,Sindhi) Literate Population 65%, English 5%, Multi-lingual mostly Digital Divide = Language Divide ?! Need of Indic Language Computing Environment!

3 Scripts: % of Population
Devnagari Bangala Telugu Tamil Arabic/Pers Gujarati Kannada – Malayalam – 3.7 Oriya Gurumukhi – 2.9 ….. Govt. of India – 1997 Survey

4 Scripts : Print & Internet
% Population % Print % Web Latin CJK Indic Year 2000 Language & Script Europe (11) Indic (18+4)

5 Indic Scripts & ISCII Dept. of Electronics – Indian Script Standard Committee 1986 – 88 – ISCII –1998. ISCII – – 8 bit character Code (Lower Set ASCII ) – Escape sequence for Script identity. – Brahmi-based-covering 9 Indic scripts (Transliteration – automatic)

6 Unicode Consortium - History
Consortium / Non-profit / Regd USA. Open Standard / ISO & W3C Members – Corporate / Institutional - Voting Assoc. / Individual - Non-voting To meet ‘implementation needs’ that will not be invalidated any time in future

7 Unicode For Characters (Scripts / Ideographs / Symbols/…) NOT Glyphs.
Platform independent. Content Inter-operable / Inter-change. Availability of tools for text processing. Global Presence.

8 Unicode Version Unicode Version 1/1.1/1.2 – 1991 / 92 / 93 –
Version 2.0 /2.1 – 1996 /97 – Internet Web Version 3.0/3.1/3.2 – 2000 /1 /2 – India Govt. Version 4.0/4.1 – 2003/2004 – 96,382 (70,207) Version

9 Unicode : Technical Specification
Code Area – 10FFFF (Hexadecimal) Basic Multilingual Plane 0- FFFF bit Code (2bytes) – Plane 0 E000 – F000 (Private Use Area) Plane 1 .. Plane … Plane 15 (PUA- A) + PUA-B PUA (National standard + Vendor Specific Codes) (Japanese – NEC – Fujitsu) Indic Script – One code page for each language For Information Exchange – UTF – 8 (8 bit byte string) can be 1 to 4 (UTF-8) bytes

10 Unicode & Indic Scripts
Conjuncts – Glyphs for rendering (not for Unicode) - Multiple ways to express - ZWNJ (U+200C) / ZWJ (U+200D) Collation Unicode & Language Order different - Devnagari – Hindi /Marathi - Latin – (Danish / Norwegion – Ä / Ö) (Hungarian / German)

11 Unicode Stability Policy
Encoding Stability - Code NOT to be moved /removed - Later version – Superset of Earlier Version Name Stability - Character Name NOT to be changed Identity Stability - Identifying characteristic – unchanged - Glyph / Case Mapping / …. ……….

12 Unicode : Stages for Acceptance
Initial & Explore Stage Chakma / Newari (BMP), Kaithi /Ahom/ Indus Valley (P1) Proposal in early Committee Recommendation Manipuri (BMP), Brahmi (Plane 1) Approved Proposal with ISO Lepcha /Ol-chiki /Saurashtra (BMP) Finalized Encoding Script in Pre Publication Code Pages for Limbu / kharosti / Sylot Nagari

13 Tasks Convergence – State Govts. / Language Commun.
Active Participation in UNICODE Recognized Languages Additional Initiatives for Other Indic Region Scripts E-Governance Applications Unicode complaint Indic Scripts


Download ppt "UNICODE & Indic Scripts"

Similar presentations


Ads by Google