Www.cdacnoida.in 1 Internationalization Localization & Unicode Karunesh Arora Vijay Gugnani C-DAC Noida.

Slides:



Advertisements
Similar presentations
Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect
Advertisements

Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect.
Chapter 3 – Web Design Tables & Page Layout
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
XHTML Basics.
Microsoft Word: What you need to know for your Legal Analysis Writing and Research (LAWR) Class.
Web Accessibility Tests Using the Firefox Browser ACCESS to Postsecondary Education through Universal Design for Learning.
Screen guidelines For data entry. Screen Layout for Data Entry Identify screen (name and purpose). Keep number of screens to a minimum. Ensure that all.
1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation.
PowerPoint Lesson 2 Creating and Enhancing PowerPoint Presentations
Chapter 3 Software Two major types of software
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
26 April 2001 Unicode and Windows XP, IUC 18 (Hong Kong) Unicode and Windows XP Cathy Wissink Program Manager, Globalization Windows Division Microsoft.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 5: Setting Up Global Accessibility.
Chapter 1 Variables in the Web Design Environment.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Unicode & W3C Jataayu Software C. Kumar January 2007.
San José, CA – September, 2004 Localizing with XLIFF and ICU Markus Scherer Raghuram (Ram) Viswanadha IBM San.
Internet Business Networking Globalisation and Culture.
Getting Started with Expression Web 3
Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics.
Multimedia and the Web Chapter Overview  This chapter covers:  What Web-based multimedia is  how it is used today  advantages and disadvantages.
ASCII and Unicode.
Encoding and fonts Edward Garrett Software Developer, ELAR.
Internationalization (I18N) Sufficiency Testing Presented to Seattle Area Software Quality Assurance Group June 19, 2003.
Sakai: Localization & Internationalization Beth Kirschner University of Michigan
SOFTWARE INTERNATIONALIZATION Dallas Ramsden. Internationalization GOAL Software that can run ANYWHERE in the world without having the source code changed.
Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information.
File Formats Chapter 9 Bit Literacy. File formats are often ignored by users Applications automatically save files in the application’s format All formats.
Getting Started with Application Software
Chapter 2 Creating a Research Paper with References and Sources
Chapter 1 Understanding the Web Design Environment Principles of Web Design, 4 th Edition.
Week 1 Understanding the Web Design Environment. 1-2 HTML: Then and Now HTML is an application of the Standard Generalized Markup Language Intended to.
CP2022 Multimedia Internet Communication1 HTML and Hypertext The workings of the web Lecture 7.
Company Confidential 1 This presentation is solely for the use of Patni personnel. No part of it may be circulated, quoted, or reproduced for distribution.
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
Designing Interface Components. Components Navigation components - the user uses these components to give instructions. Input – Components that are used.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Lecture 3Programming Handheld and Mobile devices 1 Programming of Handheld and Mobile Devices Lecture 3 Palm conventions Rob Pooley
Introduction to Interactive Media Interactive Media Components: Text.
10 – 12 APRIL 2005 Riyadh, Saudi Arabia. Building multi-lingual ASP.Net application that handle western languages and Arabic with a single code base.
Copenhagen, 6 June 2006 EC CHM Multilinguality Anton Cupcea Finsiel Romania.
Graphical Enablement In this presentation… –What is graphical enablement? –Introduction to newlook dialogs and tools used to graphical enable System i.
Sorting it all out: An introduction to collation Cathy Wissink Michael Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Java – in context Main Features From Sun Microsystems ‘White Paper’
Microsoft Expression Web 3 – Illustrated Unit D: Structuring and Styling Text.
Understanding Character Encodings Basics of Character Encodings that all Programmers should Know. Pritam Barhate, Cofounder and CTO Mobisoft Infotech.
San Jose, California September 2002 What is ICU? Roadmap and Myths Helena Shih Chapman ICU Development Manager IBM Globalization Center of Competency.
DATA REPRESENTATION 4 Y. Colette Lemard February 2009.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
© 2012 The McGraw-Hill Companies, Inc. All rights reserved. word 2010 Chapter 1 Getting Started with Word 2010.
Software Usability Course notes for CSI University of Ottawa Section 5: Internationalization Timothy C. Lethbridge
CHAPTER 1 & 2 – MICROSOFT WORD Sravanthi Lakkimsetty April 11, 2016.
Assistive Technology for Information Access (Visual Impairments) UNDERSTANDING ACCESSIBLE FORMATS.
1 January 31, Documenting Software William Cohen NCSU CSC 591W January 31, 2008.
GO! with Microsoft Office 2016
XML QUESTIONS AND ANSWERS
GO! with Microsoft Access 2016
TOPICS Information Representation Characters and Images
A Brief Introduction to the Internet
Introducing HTML & XHTML:
Software Usability Course notes for CSI University of Ottawa
Chapter 5 Technical Communication in a Transnational World
Provide Effective Internationalization and Accessibility Lecture-13
Presentation transcript:

1 Internationalization Localization & Unicode Karunesh Arora Vijay Gugnani C-DAC Noida

“Everyone has the right... to seek, receive and impart information and ideas through any media regardless of frontiers” -- Universal Declaration of Human Rights

3 Internationalization Internationalization, which is often referred as i18n, depicts the practice of designing and developing a application, product or document in a way that makes it easily localizable for target audiences that vary in culture, region, or language.

4 Why Internationalization? To remove barriers to local and international access Adaptation to local, regional, linguistic or cultural needs. To provide global reach ROI, Revenue generation

5 Internationalization Vs. Localization Localization is the actual adaptation to meet the language, cultural, and other requirements for specific target audience. While internationalization gives us the technology and tools to target a given audience, it’s the act of localization that makes it accessible.

6 What goes with localization? Localization is much more than translation. Specifically, localization refers to adaptation to other language, which involves appropriate: –Language Translation –Locale transformation and Cultural aspects

7 Language Translation Most languages are used in many countries, not just those where they are dominant or “official” People migrate and take languages with them Over enough time, most languages evolve differently in different locations Languages and Countries

8 Scripts and Languages A “script” may be defined as collection of related characters –It is common for several languages to share most, but not all characters from a given script –Scripts are often given the same name as one of the languages that uses them Arabic script, but Arabic, Farsi, Urdu,… languages –Scripts are also given common name for a group of languages Devanagri script for Hindi, Marathi, Nepali, Konkani etc. Language Translation:

9 Language Translation Identify ‘Translatable’ and ‘Non-translatable’ strings Gender and number agreement, ordering of segments in a sentence e.g. Page number -> e.g. Number of pages -> Many languages can take at least 30% more spaceTool – उपकरण (HI) & ग्राहक - customer (EN) –Design should be compatible, or else the UI may have to be redesigned –Narrow columns often cannot accommodate long Target language equivalent words Some Points to consider:

10 Avoid ambiguous phrases ‘Display options’ –Options of the display -- as Noun Noun –Show the options (all of them) – as Verb Noun Proverbs and metaphors may not have equivalents in target language Keep Web pages and paragraphs short. Avoid text in graphics. Use simple grammatical structures. Use everyday language. Provide clues. Language Translation Some Points to consider… Contd.:

11 Follow source language conventions. Avoid acronyms. Abbreviations may have to be expanded when translated Check spelling and grammar. The more compact the source writing, the longer the Translation Brief translators about the purpose and target audience All items in a menu or set of check boxes should have the same grammatical structure Language Translation Some Points to consider… Contd.:

12 Locale Set of parameters that define the user’s language, country and cultural preferences

13 Different aspects of locale Names & Titles Calendars, Numeric, Date and Time formats, Addresses, Currencies, Paper size, Weights & measures Input Mechanism, Language Selection, Oral Pronunciation

14 Titles and Names In India, it is required to specify etc.) –these titles do not necessarily translate Family name is not always last (In South & West part of country) Sorting can be based on last name or first Salutations in letters (e.g. Dear) are different in different locales e.g.

15 Titles and Names Source: Delhi Press Prakashan

16 Calendars The Gregorian calendar should not always be assumed –Proper localization of some software requires the use (at least as an option) of calendars distinct to a culture E.g. Vikram Samvat/ Saka / Hijri calendar in India Calendars of various religions where year 0 was not 2006 years ago –Fiscal-year based calendars vary widely Some have 13 months (364/28) or 53 weeks

17 Date formats Date separators depend on locale ‘/’, ‘-’, ‘.’ ‘am’ and ‘pm’ are not used universally (many cultures use 24 hour clock) –ISO standard dates are unambiguous yyyy-mm-dd hh:mm:ss Non ISO date means different things in different locales.  If not using ISO, then display dates in the locale of the user  Preferably use a ‘long’ form with the month spelled out (in the correct language)

18 Formatting Numbers locale dependent, not the language of application Group separation –Number of digits in a group In English and ISO it is 3 while for Indic languages its different 1,23,456 i.e. ##,##,##,### –Group separator In English ‘,’, but ISO uses space, and some locales use ‘.’ or none Decimal separator ‘.’, ‘. ’, ‘,’ Negative symbol ‘-’, ‘~’, ‘(…)’

19 Currency Use the currency symbol of the data –i.e. INR doesn’t automatically translate to £ or $ when the locale changes Format depends on the user’s locale, not the currency –Differences in formats: Symbol Position (before or after the currency) Blanks separating the symbol from the data

20 Currency contd… Different ways of expressing Rs  Rs.1000 OR Rs. 1000/- or Rs.1,000/- or Rs  INR 1000  1000 Rupees 1000 रुपये Strong currencies like Indian need decimal precision (e.g. 2 digits after the decimal point for paisa)

21 Language selection Avoid using national flags to choose preferred language –Multiple countries use the same language Display of language selection order? Language of displaying languages ? –In the language itself, or with a translation in the default language of the operating system

22 Pronunciation Important for Speech based systems –Higher recognition accuracy can be obtained by tailoring voice input to regional dialects –Voice output in the wrong dialect can make an application sound ‘foreign’ –Applications supported with regional dialects have better impact

23 Culture Culture is a complex collection of experiences which condition daily life; It includes history, social structure, geographical effects, religion, traditional customs and everyday usage.

24 Cultural issues Icons, symbols and images Colors, myths, beliefs and feelings Humour Geographical & environmental effects Customs & traditions Social Security Numbers

25 Icons & Symbols Icons that are a play on words do not translate –e.g. A dust bin for dumping files A rocket for launching an application A scissors for cutting in edit operation “B”, “I”, “U” Some concepts have been found extremely hard to represent as an icon –E.g. Sorting (‘A->Z’ is not universal) Images of people or body parts such as hands –Considered inappropriate in some cultures –What skin color do you use? –People Images need to be localized for each country

26 Colors & Humour The color white may represent purity and green prosperity in the Indian context, but it may not be the same in another culture. Humour generally does not get translated People are sensitive to different things in different cultures Jokes/cartoons can be offensive

27 Customs & Traditions In the Indian culture, people show respect to their elders and renowned personalities by addressing them in plural. e.g. Dr. Manmohan Singh is the prime minister of India. डॉ. मनमोहन सिंह भारत के प्रधानमंत्री हैं। Similarly, in social relationships, there are several words to address a relation e.g. for ‘uncle’ - चाचा, ताऊ, मौसा

28 Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. Unicode? Source:

29 Universal Character Encoding … Unique number for every character

30 Unifies all Languages 96 thousand characters, so far All characters accessible at the same time, in the same document: क, க, ಔ,…

31 Wide Spread Support Developed & supported by industry leaders: –Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys, … Supported in standards: –XML, HTML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, Perl, etc. Implemented in: –All modern operating systems, browsers, and other products

32 IDN – भाषा.in

33 Information about Unicode –Online Standard –Technical Reports –FAQs –General Information –Discussion Forums, Conferences

34 Resources Availability System APIs: –Windows, Java, Unix, Oracle, DB2, Sybase, Mac, Linux, … Languages –Java, JavaScript, C#, Perl 5.6.0, C, C++, SQL, … Cross-platform libraries: –ICU, Rosette, …

35 Indic Support in Unicode ISCII the basis for characters and allocation DIT is member of Consortium Reports have been submitted on missing characters, clarifications or corrections of usage

36 ISCII : Similarities Within script, layout and contents nearly identical Independent + dependent vowels Halant model for representing conjuncts –conjuncts / half-forms not directly encoded –represented by sequences instead Phonetic sequence – order in syllables

37 ISCII : Differences Unicode is stateless: –No shifting to get different scripts –Each character has a unique number Unicode is uniform: –No extension bytes necessary –All characters coded in the same space

38 Advantages Accessible Information across the globe Seamless multilingual documents Opens up software export market, beyond English Connects India to the world

39 The Future The world is moving rapidly to Unicode Unicode makes India open to the world –The world comes to you, and –You go to the world

40 Multiple Forms UTF-8: maximal compatibility with 8-bit systems UTF-16: good storage, interoperability with Windows/Java UTF-32: simplest processing Fast, lossless conversion

41 W3C Internationalization Activity

42 Presentation / Styling issues – Styling of first character If some styling feature is to be applied to the starting character, then whether it will be applied to a single character, conjunct character, a syllable or a Grapheme cluster. e.g. स्थिति (Position) प्रस्थान (Departure) स्वर (Vowel) कोश (Dictionary) हिंदी (Hindi) हिन्दी (Hindi) क्षेत्रीय (Regional) Some Issues under discussion in IL

43 Presentation / Styling issues – Styling of first character Some Issues under discussion in IL

44 Presentation / Styling issues – In Cursive Text like Arabic and Urdu the styling is applied to whole word Saabiq -> Former Urdu Source: Rashtriya Sahara Some Issues under discussion in IL

45 Presentation / Styling issues – Vertical arrangement of characters If some string is written in vertical mode, then writing each character on a new line may not be suitable Some Issues under discussion in IL

46 Presentation / Styling issues – Horizontal spacing e.g. Some Issues under discussion in IL

47 Presentation / Styling issues – Bullets and numbers Number schemes to be supported in Indian languages also. Some Issues under discussion in IL

48 Presentation / Styling issues – Collation A means to search and order data in a way that makes sense in their particular culture Myths - One collation is good enough Unicode enabled – sorting is already covered Some Issues under discussion in IL

49 Presentation / Styling issues Some Issues in Indian Languages

50 Presentation issues –Underlining of the characters अन्य भाषाओं में भी अनुवाद Some Issues under discussion in IL

51 Searching issues –Problem in searching in languages sharing same script and some words being same but semantically different Some Issues

52 Issues on presentation on other devices Addressing Input mechanism, predictive input for vernacular languages Handling display issues in Hand held devices with smaller screen, in cases of translation Standardizing encoding issues in communication for taking care of cost of bandwidth (ISCII / Unicode / Compressed Unicode), connectivity and on-the-fly conversion of encodings

53 References and acknowledgements Articles by Richard Ishida, Felix Sasaki, W3C Presentation by Mark Davis 22Internationalization.ppt

54 Thank you