Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.

Similar presentations


Presentation on theme: "Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania."— Presentation transcript:

1 Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania

2 Unicode, encodings and character sets

3 3 How it all started…  Until recently, most computers used font sets with a maximum 256 characters (ANSI):  The first 128 (ASCII):  numbers  letters a-z and A-Z  punctuation marks  The second 128 set varies:  English-speaking world contain:  more punctuation marks  currency symbols (e.g. £)  accented letters (á, é, ñ, ç, ô)  Places like Egypt, Greece, Russia contain characters taken from the corresponding alphabet: Arabic, Greek, Cyrillic

4 4 Code, encoding  Character code – a sequence of bits that a computer use to represent a character  Encoding – the rule describing how a set of bytes are transformed into characters

5 5 Problem  These encoding systems also conflict with one another – two encodings  can use the same number for two different characters  can use different numbers for the same character  Data can become incomprehensible when transferred from one place to another

6 6 Solution  Moving to a system that assigns a unique number to each character in each language of the world  The Unicode standard provides a unique number for every character no matter what the platform, no matter what the program, no matter what the language  Unicode (as defined by the Unicode Consortium) has become a universal standard: ISO/IEC 10646, describing the 'Universal Multiple-Octet Coded Character Set' (UCS)

7 7 Unicode  Unicode repertoire can be encoded in more than one way: UTF-8, UTF- 16, UTF-32  UTF-8 encodes:  ASCII characters on 1 byte  other characters up to 6 bytes  Incorporating it into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets  Enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering  Allows data to be transported through many different systems without corruption.

8 Internationalization and localization

9 9 I18n  Internationalization (I18n): modification of an application so that it can handle multiple languages, countries, etc.:  Display content (web pages, files) in end user’s language  Display messages around the site in user’s language (e.g. “Home”, “Search”, error messages)  Input characters in end user’s language  Printing out the correct characters  Handling dates, numbers and sorting words using the rules of that language

10 10 L10n  Localization (l10n) involves taking a product and making it linguistically and culturally appropriate to the target locale (country/region and language)  Means to change the language on a Web site:  User selection  Detecting the browser settings  Automatically, based on the user’s profile  Translation issue:  Identifying un-translated or old translations of terms and phrases  Different roles for translators and content managers  Offering an interface for the content translation

11 11 Example of XLIFF translation file coming from the translation service XLIFF: XML Localization Interchange File Format

12 Sorting in different languages

13 13 Sorting in the same language  Strings must be sorted according to that language sorting rules  Complex characters, ignorable characters and exceptional words to be considered  Normally done in to steps:  primary sorting  uppercase and lowercase characters are equivalent  diacritical marks are ignored  ignorable characters are not considered  secondary sorting  difference between uppercase and lowercase  characters with diacritical marks are ranked individually  ignorable characters influence the sorting

14 14 Sorting in different languages  Approaches  1.  All strings in the same language should be sorted according to that language’s rules  Sorting is also governed by order among languages or among groups of languages  e.g English, German, French = Roman group  2.  Sort using the sorting rules that are associated with the language chosen by the end-user or site language

15 SEMIDE portal and toolkit - multilinguality issues

16 16 Multilingual portal – EN, FR, AR, …

17 17 Features  All pages are encoded in UTF-8  all characters of the word are supported  Default language set at startup: English

18 18 What aspects are multilingual?  Graphical user interface  translation from the administrative area  one-by-one,.po,.XLIFF  Content  individual translation for each item on edit  Glossaries and thesauri  translation from the Zope’s Management Interface  Syndication (RDF channels)  depends on the selected language  Searches  user multiple selection

19 19 Language negotiation  When an item is not translated in the language selected by the end user, the system searches translations in: 1.the language from the user's browser settings 2.the default language  …and displays the items’ id if none of these work


Download ppt "Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania."

Similar presentations


Ads by Google