Presentation on theme: "Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode) Michel Suignard Microsoft Corporation."— Presentation transcript:
Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode) Michel Suignard Microsoft Corporation
Objectives n Worldwide single binary n Multilingual n DTP level on all writing systems –Line breaking –Font selection –word breaking –line justification
Challenges n Asian typography is not as well known as Western typography n Conflicting requirements –Vertical versus horizontal layout –Latin word wrap off –Ideographic word wrap on n Size of the Unicode repertoire (35K and growing)
JIS-X-4051 n First published in March 1993 –Does not address Unicode repertoire –Limited description of character classification n 2nd edition in October 1995 –Based on JIS-X- 221 (ISO 10646-1) –More detailed Character classification (20 classes) –Covers Line Breaking, Line composition rules, Ruby positioning, Horizontal in Vertical,…
Issues with JIS-X-4051 n Still a subset of Unicode n Character class contents are overlapping, (relying on contextual information not available to General Purpose software) n Single behavior class n Half/Full width characters not covered (user-defined) n Not aligned with most font design (Narrow versus Wide symbols) n Lack some useful features (like line break analysis across white space)
Character classification n Unicode space decomposed in Partitions (set of character ranges) n Each partition share a common behavior across all covered typographic rules n Partitions are mapped to classes specific to each rules (e.g. line breaking, font selection, etc…)
Typical usage After behavior class Before behavior class
Line breaking n Kinsoku rules, to avoid this: or Stricter rules for small kana (like in ) Stricter rules for small kana (like in ) n Keep numeric expressions together, including postfix and prefix symbols n Allows French typography rules (no break between last word and :;?!, even if separated by a space character) n Disable Latin word wrap n Keep ideographic characters together
Width modification and auto- spacing n Width Modification (contextual kerning) : becomes n Width Modification (contextual kerning) : ( (text) ) becomes ((text)) Auto-spacing (add space between ideographic text and Western or numeric text) becomes: Auto-spacing (add space between ideographic text and Western or numeric text) western text becomes: western text
Font selection scenario A new font is applied to a large multilingual selection of text. Is that movie a Japanese movie? Yes, it is. Assume we want to change the font of the English text, but still selecting the whole text: And we apply the Haettenschweiler font to it, it is desirable to only affect the Latin text. Is that movie a Japanese movie? Yes, it is. It is similar situation when we want to apply an Asian face to the Japanese text (like HG) Is that movie a Japanese movie? Yes, it is.
Font selection based on character code point and context n Because there are no global Unicode fonts (fonts usually covers a group of writing systems) n Language is an important context selector to determine appropriate font (CJK context, ASCII symbols, Narrow versus Wide Greek and Cyrillic characters) n Some writing systems require several glyphs per characters and are better handled by having specialized fonts (Arabic, Hindi) n A large number of punctuation are shared among writing systems with non shareable typeface (e.g. Period. between Latin and Armenian)
Ruby overhanging n Commonly used name to describe the association of pronunciation characters associated with base characters. n The Ruby sequence may be allowed to overhang on top of preceding or following the base characters as long as it doesnt introduce confusion. n The classification allows to determine in which manner characters can be overhung: –No overhanging (e.g. CJK Ideographs), –Allowed only Before (e.g. Open quotes) –Allowed only After (e.g. Close quotes) –Allowed in both case (e.g. Hiragana)
Conclusion / Findings n A detailed analysis of the Unicode repertoire along common behavior is a powerful tool to construct sophisticated typographical effects. n Typographic complexity should be expressed as much as possible in tables and properties, not in code. n Many behaviors are correlated, allowing the usage of a limited number of Unicode partitions for many behavior descriptions.