Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.cdacnoida.in 1 Internationalization Localization & Unicode Karunesh Arora Vijay Gugnani C-DAC Noida.

Similar presentations


Presentation on theme: "Www.cdacnoida.in 1 Internationalization Localization & Unicode Karunesh Arora Vijay Gugnani C-DAC Noida."— Presentation transcript:

1 www.cdacnoida.in 1 Internationalization Localization & Unicode Karunesh Arora Vijay Gugnani C-DAC Noida

2 www.cdacnoida.in “Everyone has the right... to seek, receive and impart information and ideas through any media regardless of frontiers” -- Universal Declaration of Human Rights

3 www.cdacnoida.in 3 Internationalization Internationalization, which is often referred as i18n, depicts the practice of designing and developing a application, product or document in a way that makes it easily localizable for target audiences that vary in culture, region, or language.

4 www.cdacnoida.in 4 Why Internationalization? To remove barriers to local and international access Adaptation to local, regional, linguistic or cultural needs. To provide global reach ROI, Revenue generation

5 www.cdacnoida.in 5 Internationalization Vs. Localization Localization is the actual adaptation to meet the language, cultural, and other requirements for specific target audience. While internationalization gives us the technology and tools to target a given audience, it’s the act of localization that makes it accessible.

6 www.cdacnoida.in 6 What goes with localization? Localization is much more than translation. Specifically, localization refers to adaptation to other language, which involves appropriate: –Language Translation –Locale transformation and Cultural aspects

7 www.cdacnoida.in 7 Language Translation Most languages are used in many countries, not just those where they are dominant or “official” People migrate and take languages with them Over enough time, most languages evolve differently in different locations Languages and Countries

8 www.cdacnoida.in 8 Scripts and Languages A “script” may be defined as collection of related characters –It is common for several languages to share most, but not all characters from a given script –Scripts are often given the same name as one of the languages that uses them Arabic script, but Arabic, Farsi, Urdu,… languages –Scripts are also given common name for a group of languages Devanagri script for Hindi, Marathi, Nepali, Konkani etc. Language Translation:

9 www.cdacnoida.in 9 Language Translation Identify ‘Translatable’ and ‘Non-translatable’ strings Gender and number agreement, ordering of segments in a sentence e.g. Page number -> e.g. Number of pages -> Many languages can take at least 30% more spaceTool – उपकरण (HI) & ग्राहक - customer (EN) –Design should be compatible, or else the UI may have to be redesigned –Narrow columns often cannot accommodate long Target language equivalent words Some Points to consider:

10 www.cdacnoida.in 10 Avoid ambiguous phrases ‘Display options’ –Options of the display -- as Noun Noun –Show the options (all of them) – as Verb Noun Proverbs and metaphors may not have equivalents in target language Keep Web pages and paragraphs short. Avoid text in graphics. Use simple grammatical structures. Use everyday language. Provide clues. Language Translation Some Points to consider… Contd.:

11 www.cdacnoida.in 11 Follow source language conventions. Avoid acronyms. Abbreviations may have to be expanded when translated Check spelling and grammar. The more compact the source writing, the longer the Translation Brief translators about the purpose and target audience All items in a menu or set of check boxes should have the same grammatical structure Language Translation Some Points to consider… Contd.:

12 www.cdacnoida.in 12 Locale Set of parameters that define the user’s language, country and cultural preferences

13 www.cdacnoida.in 13 Different aspects of locale Names & Titles Calendars, Numeric, Date and Time formats, Addresses, Currencies, Paper size, Weights & measures Input Mechanism, Language Selection, Oral Pronunciation

14 www.cdacnoida.in 14 Titles and Names In India, it is required to specify etc.) –these titles do not necessarily translate Family name is not always last (In South & West part of country) Sorting can be based on last name or first Salutations in letters (e.g. Dear) are different in different locales e.g.

15 www.cdacnoida.in 15 Titles and Names Source: Delhi Press Prakashan

16 www.cdacnoida.in 16 Calendars The Gregorian calendar should not always be assumed –Proper localization of some software requires the use (at least as an option) of calendars distinct to a culture E.g. Vikram Samvat/ Saka / Hijri calendar in India Calendars of various religions where year 0 was not 2006 years ago –Fiscal-year based calendars vary widely Some have 13 months (364/28) or 53 weeks

17 www.cdacnoida.in 17 Date formats Date separators depend on locale ‘/’, ‘-’, ‘.’ ‘am’ and ‘pm’ are not used universally (many cultures use 24 hour clock) –ISO standard dates are unambiguous yyyy-mm-dd hh:mm:ss Non ISO date 01-03-02 means different things in different locales.  If not using ISO, then display dates in the locale of the user  Preferably use a ‘long’ form with the month spelled out (in the correct language)

18 www.cdacnoida.in 18 Formatting Numbers locale dependent, not the language of application Group separation –Number of digits in a group In English and ISO it is 3 while for Indic languages its different 1,23,456 i.e. ##,##,##,### –Group separator In English ‘,’, but ISO uses space, and some locales use ‘.’ or none Decimal separator ‘.’, ‘. ’, ‘,’ Negative symbol ‘-’, ‘~’, ‘(…)’

19 www.cdacnoida.in 19 Currency Use the currency symbol of the data –i.e. INR doesn’t automatically translate to £ or $ when the locale changes Format depends on the user’s locale, not the currency –Differences in formats: Symbol Position (before or after the currency) Blanks separating the symbol from the data

20 www.cdacnoida.in 20 Currency contd… Different ways of expressing Rs. 1000  Rs.1000 OR Rs. 1000/- or Rs.1,000/- or Rs. 1000.00  INR 1000  1000 Rupees 1000 रुपये Strong currencies like Indian need decimal precision (e.g. 2 digits after the decimal point for paisa)

21 www.cdacnoida.in 21 Language selection Avoid using national flags to choose preferred language –Multiple countries use the same language Display of language selection order? Language of displaying languages ? –In the language itself, or with a translation in the default language of the operating system

22 www.cdacnoida.in 22 Pronunciation Important for Speech based systems –Higher recognition accuracy can be obtained by tailoring voice input to regional dialects –Voice output in the wrong dialect can make an application sound ‘foreign’ –Applications supported with regional dialects have better impact

23 www.cdacnoida.in 23 Culture Culture is a complex collection of experiences which condition daily life; It includes history, social structure, geographical effects, religion, traditional customs and everyday usage.

24 www.cdacnoida.in 24 Cultural issues Icons, symbols and images Colors, myths, beliefs and feelings Humour Geographical & environmental effects Customs & traditions Social Security Numbers

25 www.cdacnoida.in 25 Icons & Symbols Icons that are a play on words do not translate –e.g. A dust bin for dumping files A rocket for launching an application A scissors for cutting in edit operation “B”, “I”, “U” Some concepts have been found extremely hard to represent as an icon –E.g. Sorting (‘A->Z’ is not universal) Images of people or body parts such as hands –Considered inappropriate in some cultures –What skin color do you use? –People Images need to be localized for each country

26 www.cdacnoida.in 26 Colors & Humour The color white may represent purity and green prosperity in the Indian context, but it may not be the same in another culture. Humour generally does not get translated People are sensitive to different things in different cultures Jokes/cartoons can be offensive

27 www.cdacnoida.in 27 Customs & Traditions In the Indian culture, people show respect to their elders and renowned personalities by addressing them in plural. e.g. Dr. Manmohan Singh is the prime minister of India. डॉ. मनमोहन सिंह भारत के प्रधानमंत्री हैं। Similarly, in social relationships, there are several words to address a relation e.g. for ‘uncle’ - चाचा, ताऊ, मौसा

28 www.cdacnoida.in 28 Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. Unicode? Source: http://unicode.org

29 www.cdacnoida.in 29 Universal Character Encoding … Unique number for every character

30 www.cdacnoida.in 30 Unifies all Languages 96 thousand characters, so far All characters accessible at the same time, in the same document: क, க, ಔ,…

31 www.cdacnoida.in 31 Wide Spread Support Developed & supported by industry leaders: –Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys, … Supported in standards: –XML, HTML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, Perl, etc. Implemented in: –All modern operating systems, browsers, and other products

32 www.cdacnoida.in 32 IDN –http:// भाषा.in

33 www.cdacnoida.in 33 Information about Unicode www.unicode.org –Online Standard –Technical Reports –FAQs –General Information –Discussion Forums, Conferences

34 www.cdacnoida.in 34 Resources Availability System APIs: –Windows, Java, Unix, Oracle, DB2, Sybase, Mac, Linux, … Languages –Java, JavaScript, C#, Perl 5.6.0, C, C++, SQL, … Cross-platform libraries: –ICU, Rosette, …

35 www.cdacnoida.in 35 Indic Support in Unicode ISCII the basis for characters and allocation DIT is member of Consortium Reports have been submitted on missing characters, clarifications or corrections of usage

36 www.cdacnoida.in 36 ISCII : Similarities Within script, layout and contents nearly identical Independent + dependent vowels Halant model for representing conjuncts –conjuncts / half-forms not directly encoded –represented by sequences instead Phonetic sequence – order in syllables

37 www.cdacnoida.in 37 ISCII : Differences Unicode is stateless: –No shifting to get different scripts –Each character has a unique number Unicode is uniform: –No extension bytes necessary –All characters coded in the same space

38 www.cdacnoida.in 38 Advantages Accessible Information across the globe Seamless multilingual documents Opens up software export market, beyond English Connects India to the world

39 www.cdacnoida.in 39 The Future The world is moving rapidly to Unicode Unicode makes India open to the world –The world comes to you, and –You go to the world

40 www.cdacnoida.in 40 Multiple Forms UTF-8: maximal compatibility with 8-bit systems UTF-16: good storage, interoperability with Windows/Java UTF-32: simplest processing Fast, lossless conversion

41 www.cdacnoida.in 41 W3C Internationalization Activity

42 www.cdacnoida.in 42 Presentation / Styling issues – Styling of first character If some styling feature is to be applied to the starting character, then whether it will be applied to a single character, conjunct character, a syllable or a Grapheme cluster. e.g. स्थिति (Position) प्रस्थान (Departure) स्वर (Vowel) कोश (Dictionary) हिंदी (Hindi) हिन्दी (Hindi) क्षेत्रीय (Regional) Some Issues under discussion in IL

43 www.cdacnoida.in 43 Presentation / Styling issues – Styling of first character Some Issues under discussion in IL

44 www.cdacnoida.in 44 Presentation / Styling issues – In Cursive Text like Arabic and Urdu the styling is applied to whole word Saabiq -> Former Urdu Source: Rashtriya Sahara Some Issues under discussion in IL

45 www.cdacnoida.in 45 Presentation / Styling issues – Vertical arrangement of characters If some string is written in vertical mode, then writing each character on a new line may not be suitable http://www.w3.org/International/notes/firstletter.html Some Issues under discussion in IL

46 www.cdacnoida.in 46 Presentation / Styling issues – Horizontal spacing e.g. Some Issues under discussion in IL

47 www.cdacnoida.in 47 Presentation / Styling issues – Bullets and numbers Number schemes to be supported in Indian languages also. Some Issues under discussion in IL

48 www.cdacnoida.in 48 Presentation / Styling issues – Collation A means to search and order data in a way that makes sense in their particular culture Myths - One collation is good enough Unicode enabled – sorting is already covered Some Issues under discussion in IL

49 www.cdacnoida.in 49 Presentation / Styling issues Some Issues in Indian Languages

50 www.cdacnoida.in 50 Presentation issues –Underlining of the characters अन्य भाषाओं में भी अनुवाद Some Issues under discussion in IL

51 www.cdacnoida.in 51 Searching issues –Problem in searching in languages sharing same script and some words being same but semantically different Some Issues

52 www.cdacnoida.in 52 Issues on presentation on other devices Addressing Input mechanism, predictive input for vernacular languages Handling display issues in Hand held devices with smaller screen, in cases of translation Standardizing encoding issues in communication for taking care of cost of bandwidth (ISCII / Unicode / Compressed Unicode), connectivity and on-the-fly conversion of encodings

53 www.cdacnoida.in 53 References and acknowledgements http://www.w3.org/international Articles by Richard Ishida, Felix Sasaki, W3C http://macchiato.com/slides/UnicodeAndIndia.ppt, Presentation by Mark Davis www.site.uottawa.ca/ftppub/courses/Winter/csi5122/coursenotes/51 22Internationalization.ppt

54 www.cdacnoida.in 54 Thank you


Download ppt "Www.cdacnoida.in 1 Internationalization Localization & Unicode Karunesh Arora Vijay Gugnani C-DAC Noida."

Similar presentations


Ads by Google