Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support.

Similar presentations


Presentation on theme: "Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support."— Presentation transcript:

1 Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support

2 Note: All the issues and support discussed for Urdu are also applicable for other Pakistani Languages like Sindhi, Pashto, Punjabi, Balochi etc.

3 Character Encoding Font Text Display Engine

4 Character, Script, Glyph and Font

5 Character The character is identified as an abstract entity, such as "LATIN CHARACTER CAPITAL A" or ”ARABIC CHARACTER HA”. Every Character has only one position/ code point in character representation schemes like Unicode.

6 Glyph The visual representation of the character made on screen or paper is called a Glyph. A Character can have more than one Glyphs.

7 Script Script is writing Style of a language. For Example, English and French are written in Roman Script and Urdu and Farsi are written in Arabic Script

8 Writing Styles of Urdu Naskh Nastaleeq

9 Character Encoding

10 Data and hence Text is stored in computer using Binary Numbers. Character Encoding scheme like ASCII, EBCIDIC gives mapping of (English) Characters to Binary Numbers (for storage and processing). Character of any language can have character encoding. This is basis of Code Pages. Every language has a Code Page which have encoding of that language’s characters.

11 Character Encoding of Urdu Propriety Standards (Biggest Problem in Urdu Software Development) Urdu Zabta Takhti (National standard code page of Urdu) Unicode (International Standard for Multilingual Characters)

12 Unicode Unicode is repository of characters of almost all languages of the world. Unicode has more than 65,000 code- points for characters. All Software vendors are now supporting or switching to Unicode.

13 Unicode ™ / ISO 10646 16-bit international character encoding 0x0000 0xFFFF Punctuation Future use ASCII Private use Compatibility Indian Greek Arabic, Hebrew Latin Ideographs (Hanzi, Kanji, Hanja) Symbols Hangul Kana Thai A 00419662FF964F850000 (null)

14 Font for Text Display

15 Open Type Font OpenType is a new cross-platform font file format developed jointly by Adobe and Microsoft. It is an extension of True Type Font. OpenType Font may contain more than 65,000 glyphs. One character may correspond to several glyphs.

16 A rich mapping between characters and glyphs, which supports ligatures, positional forms, alternates, and other substitutions. Information to support features for two- dimensional positioning and glyph attachment. It Explicit script and language information, so a text-processing application can adjust its behavior accordingly

17 Tables in OTF Font CMAP (Character to Glyph Mapping) GDEF (Glyph Definition Data) GPOS(Glyph Position Data) GSUB(Glyph Substitution Data) BASE(Baseline Data) JSTF(Justification Data)

18 GSUB An Example of OTF Tables information for substituting glyphs to render the scripts and language systems supported in a font. Types of Substitution  A Single Substitution replaces a single glyph with another single glyph.

19  An Alternate Substitution identifies functionally equivalent but different looking forms of a glyph.  A Multiple Substitution replaces a single glyph with more than one glyph. This is used to specify actions such as ligature decomposition.  A Ligature Substitution replaces several glyph indices with a single glyph index.

20  Contextual substitution describes glyph substitutions in context–that is, a substitution of one or more glyphs within a certain pattern of glyphs. Each substitution describes one or more input glyph sequences and one or more substitutions to be performed on that sequence.

21 Text Display Engine

22 The Alphabet Soup GNOME is a desktop environment for the user, as well as a powerful application framework for the software developer. GTK+ is a multi-platform toolkit for creating graphical user interfaces offering a complete set of widgets. GTK+ is based on three libraries :  GLib  Pango  ATK library GNOME uses GTK+ for graphical user interface. GNOME and GTK+ are open source software and part of GNU Project

23 Pango Word “Pango” consists of: Greek "Pan" / U03A0 U03B1 U03BD / All Japanese "Go" / U8A9E / Language Pango project is an open-source framework for the layout and rendering of internationalized text. Pango uses Unicode (UTF-8 encoded strings) for all of its encoding, and will eventually support output in all the worlds major languages.

24 Pango Fonts Pango give support to following fonts Bitmap Fonts under the X windowing system, Type1 fonts Adobe Standard TrueType fonts Apple and Microsoft Standard OpenType fonts Adobe and Microsoft Standard

25 The Layout and Rendering Pipeline abc PAY ALIF KAF SEEN TAY ALIF NOON Itemization The input string is broken into portions rendered with a consistent font, with a consistent language tag, and with a specific bidirectional embedding level. {abc} {PAY ALIF KAF SEEN TAY ALIF NOON} Reordering The items are reordered from logical order into visual order according to their bidirectional embedding levels. {abc} {NOON ALIF TAY SEEN KAF ALIF PAY}

26 The Layout and Rendering Pipeline (contd.) Glyph Selection (Shaping) The characters in each item are turned into glyphs. Justification The glyph strings created in the previous step are adjusted to fit the line-justification policies that are in place. Rendering The justified glyph strings are rendered in their final order onto the output device. abc پاکستان

27 Sample Screenshots The GTK+ color selector localized to Farsi GTK+ labels rendering various languages

28 Web Resources www.unicode.orgwww.unicode.org www.adobe.com/type/opentype/www.adobe.com/type/opentype/ www.microsoft.com/typography/developers/opentype/www.microsoft.com/typography/developers/opentype/ communities.msn.com/MicrosoftVOLTuserscommunity/ www.gtk.orgwww.gtk.org www.pango.org i18n.kde.org tremu.gov.pk/tremu/workingroups/url.htm


Download ppt "Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support."

Similar presentations


Ads by Google