Download presentation
Presentation is loading. Please wait.
Published byShannon Bryan Modified over 8 years ago
1
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania
2
Unicode, encodings and character sets
3
3 How it all started… Until recently, most computers used font sets with a maximum 256 characters (ANSI): The first 128 (ASCII): numbers letters a-z and A-Z punctuation marks The second 128 set varies: English-speaking world contain: more punctuation marks currency symbols (e.g. £) accented letters (á, é, ñ, ç, ô) Places like Egypt, Greece, Russia contain characters taken from the corresponding alphabet: Arabic, Greek, Cyrillic
4
4 Code, encoding Character code – a sequence of bits that a computer use to represent a character Encoding – the rule describing how a set of bytes are transformed into characters
5
5 Problem These encoding systems also conflict with one another – two encodings can use the same number for two different characters can use different numbers for the same character Data can become incomprehensible when transferred from one place to another
6
6 Solution Moving to a system that assigns a unique number to each character in each language of the world The Unicode standard provides a unique number for every character no matter what the platform, no matter what the program, no matter what the language Unicode (as defined by the Unicode Consortium) has become a universal standard: ISO/IEC 10646, describing the 'Universal Multiple-Octet Coded Character Set' (UCS)
7
7 Unicode Unicode repertoire can be encoded in more than one way: UTF-8, UTF- 16, UTF-32 UTF-8 encodes: ASCII characters on 1 byte other characters up to 6 bytes Incorporating it into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets Enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering Allows data to be transported through many different systems without corruption.
8
Internationalization and localization
9
9 I18n Internationalization (I18n): modification of an application so that it can handle multiple languages, countries, etc.: Display content (web pages, files) in end user’s language Display messages around the site in user’s language (e.g. “Home”, “Search”, error messages) Input characters in end user’s language Printing out the correct characters Handling dates, numbers and sorting words using the rules of that language
10
10 L10n Localization (l10n) involves taking a product and making it linguistically and culturally appropriate to the target locale (country/region and language) Means to change the language on a Web site: User selection Detecting the browser settings Automatically, based on the user’s profile Translation issue: Identifying un-translated or old translations of terms and phrases Different roles for translators and content managers Offering an interface for the content translation
11
11 Example of XLIFF translation file coming from the translation service XLIFF: XML Localization Interchange File Format
12
Sorting in different languages
13
13 Sorting in the same language Strings must be sorted according to that language sorting rules Complex characters, ignorable characters and exceptional words to be considered Normally done in to steps: primary sorting uppercase and lowercase characters are equivalent diacritical marks are ignored ignorable characters are not considered secondary sorting difference between uppercase and lowercase characters with diacritical marks are ranked individually ignorable characters influence the sorting
14
14 Sorting in different languages Approaches 1. All strings in the same language should be sorted according to that language’s rules Sorting is also governed by order among languages or among groups of languages e.g English, German, French = Roman group 2. Sort using the sorting rules that are associated with the language chosen by the end-user or site language
15
SEMIDE portal and toolkit - multilinguality issues
16
16 Multilingual portal – EN, FR, AR, …
17
17 Features All pages are encoded in UTF-8 all characters of the word are supported Default language set at startup: English
18
18 What aspects are multilingual? Graphical user interface translation from the administrative area one-by-one,.po,.XLIFF Content individual translation for each item on edit Glossaries and thesauri translation from the Zope’s Management Interface Syndication (RDF channels) depends on the selected language Searches user multiple selection
19
19 Language negotiation When an item is not translated in the language selected by the end user, the system searches translations in: 1.the language from the user's browser settings 2.the default language …and displays the items’ id if none of these work
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.