Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode) Michel Suignard Microsoft Corporation.

Slides:



Advertisements
Similar presentations
Unicode and Keyboards on Windows
Advertisements

Slide 1 Insert your own content. Slide 2 Insert your own content.
© 2005 by Prentice Hall Appendix 3 Object-Oriented Analysis and Design Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer Joey F. George.
Chapter 7 System Models.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes The Web Wizards Guide to XML by Cheryl M. Hughes.
1 The Ideographic Composition Scheme and Its Applications in Chinese Text Processing Qin LU Department of Computing, The Hong Kong Polytechnic University.
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
Chris Pratley Lead Program Manager Microsoft Office.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Tutorial 3 – Creating a Multiple-Page Report
Project Analysis Course ( ) Final Project Report Overview.
Chapter 18 Methodology – Monitoring and Tuning the Operational System Transparencies © Pearson Education Limited 1995, 2005.
Lesson 7: Using Tables Courseware #: 3240
Unicode and Windows XP Cathy Wissink Program Manager Globalization Infrastructure, Design and Development Windows International Microsoft.
Internet Services and Web Authoring (CSET 226) Lecture # 5 HyperText Markup Language (HTML) 1.
Chapter 13 Web Page Design Studio
1 Professional Communications Working with Type Copyright © Texas Education Agency, All rights reserved. Images and other multimedia content used.
Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support.
Solutions for Multilingual Literature by XSL Formatter 6,800 known languages.
IT Systems What Number? EN230-1 Justin Champion C208 –
1 Lab Session-IV CSIT-120 Spring 2001 Lab 3 Revision and Exercises Rev: Precedence Rules Lab Exercise 4-A Machine Language Programming The “Micro” Machine.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
1/25 Writing Character sets Unicode Input methods.
How to Create a Professional Magazine Layout Handout-16.
Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation.
Glencoe Digital Communication Tools Create a Web Page with HTML Chapter Contents Lesson 4.1Lesson 4.1 Get Started with HTML (85) Lesson 4.2Lesson 4.2 Format.
Computer Applications I Unit 3 Study Guide 1 Introduction to Formatting, Alignment and Page Setup.
Chapter 9 Introduction to ActionScript 3.0. Chapter 9 Lessons 1.Understand ActionScript Work with instances of movie clip symbols 3.Use code snippets.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Internet Skills An Introduction to HTML Alan Noble Room 504 Tel: (44562 internal)
Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics.
TYPOGRAPHY.
CcTLD IDN TF Report ccTLD Meeting, Rio de Janero Mar. 25, 2003 Young-Eum Chair, ccTLD IDN TF.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 10 This presentation © 2004, MacAvon Media Productions Characters & Fonts.
T ypography Style and Substance in the Design of Words.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
B.Sc. Multimedia ComputingMedia Technologies Character Representation & Font Technology.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 3.
INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Coatbridge High School 10 Commandments For Good Design Layouts.
1 CSS3 Text Extensions Michel Suignard Microsoft Corporation.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 14 Globalization Support in the Database.
Anlab ( ) Kim, Yangjung Characters & Fonts.
T ypography Style and Substance in the Design of Words.
Week 7 Lecture 2 Globalization Support in the Database.
Microsoft Expression Web 3 Expression Web Design Feature Web Design Basics.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
Word Processing & Desktop Publishing Software Business Computer Technology Curriculum Guide 2003.
Basics of Unicode (base upon a presentation by NRSI, SIL International)
1 Non-Numeric Data Representation V1.0 (22/10/2005)
Binary Representation in Text
Binary Representation in Text
Characters & Fonts Digital Multimedia, 2nd edition
N100 Building a Simple Web Page
Unit 2.1: Identifying design elements when preparing graphics
Layout Terms Visual Hierarchy
Text.
Characters & Fonts Digital Multimedia, 2nd edition
Communicating and Adapting Language task
INFOCODING BASICS & EXAMPLES OF CURRENT USE
Layout Terms Visual Hierarchy
Word Masterclass By Dan Carline.
ASCII and Unicode.
Presentation transcript:

Worldwide typography (and how to apply JIS-X to Unicode) Michel Suignard Microsoft Corporation

Objectives n Worldwide single binary n Multilingual n DTP level on all writing systems –Line breaking –Font selection –word breaking –line justification

Challenges n Asian typography is not as well known as Western typography n Conflicting requirements –Vertical versus horizontal layout –Latin word wrap off –Ideographic word wrap on n Size of the Unicode repertoire (35K and growing)

JIS-X-4051 n First published in March 1993 –Does not address Unicode repertoire –Limited description of character classification n 2nd edition in October 1995 –Based on JIS-X- 221 (ISO ) –More detailed Character classification (20 classes) –Covers Line Breaking, Line composition rules, Ruby positioning, Horizontal in Vertical,…

Issues with JIS-X-4051 n Still a subset of Unicode n Character class contents are overlapping, (relying on contextual information not available to General Purpose software) n Single behavior class n Half/Full width characters not covered (user-defined) n Not aligned with most font design (Narrow versus Wide symbols) n Lack some useful features (like line break analysis across white space)

Character classification n Unicode space decomposed in Partitions (set of character ranges) n Each partition share a common behavior across all covered typographic rules n Partitions are mapped to classes specific to each rules (e.g. line breaking, font selection, etc…)

Typical usage After behavior class Before behavior class

Line breaking n Kinsoku rules, to avoid this: or Stricter rules for small kana (like in ) Stricter rules for small kana (like in ) n Keep numeric expressions together, including postfix and prefix symbols n Allows French typography rules (no break between last word and :;?!, even if separated by a space character) n Disable Latin word wrap n Keep ideographic characters together

Line breaking classes Partitions mapped into 15 classes: 10. Alpha space 11. Alpha characters/symbols 12. Glue Characters 13. Slash 14. Quotation characters 15. Numeric separators 1. Opening characters 2. Closing characters 3. No start ideographic 4. Exclamation/interrogation 5. Inseparable 6. Prefix 7. Postfix 8. Ideographic 9. Numeral sequence

Line breaking behavior table

Width modification and auto- spacing n Width Modification (contextual kerning) : becomes n Width Modification (contextual kerning) : ( (text) ) becomes ((text)) Auto-spacing (add space between ideographic text and Western or numeric text) becomes: Auto-spacing (add space between ideographic text and Western or numeric text) western text becomes: western text

Font selection scenario A new font is applied to a large multilingual selection of text. Is that movie a Japanese movie? Yes, it is. Assume we want to change the font of the English text, but still selecting the whole text: And we apply the Haettenschweiler font to it, it is desirable to only affect the Latin text. Is that movie a Japanese movie? Yes, it is. It is similar situation when we want to apply an Asian face to the Japanese text (like HG) Is that movie a Japanese movie? Yes, it is.

Font selection based on character code point and context n Because there are no global Unicode fonts (fonts usually covers a group of writing systems) n Language is an important context selector to determine appropriate font (CJK context, ASCII symbols, Narrow versus Wide Greek and Cyrillic characters) n Some writing systems require several glyphs per characters and are better handled by having specialized fonts (Arabic, Hindi) n A large number of punctuation are shared among writing systems with non shareable typeface (e.g. Period. between Latin and Armenian)

Ruby overhanging n Commonly used name to describe the association of pronunciation characters associated with base characters. n The Ruby sequence may be allowed to overhang on top of preceding or following the base characters as long as it doesnt introduce confusion. n The classification allows to determine in which manner characters can be overhung: –No overhanging (e.g. CJK Ideographs), –Allowed only Before (e.g. Open quotes) –Allowed only After (e.g. Close quotes) –Allowed in both case (e.g. Hiragana)

Conclusion / Findings n A detailed analysis of the Unicode repertoire along common behavior is a powerful tool to construct sophisticated typographical effects. n Typographic complexity should be expressed as much as possible in tables and properties, not in code. n Many behaviors are correlated, allowing the usage of a limited number of Unicode partitions for many behavior descriptions.