Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLDR: The Common Locale Data Repository Locales for the World Lisa Moore George Rhoten Mark Davis Steven Loomis.

Similar presentations


Presentation on theme: "CLDR: The Common Locale Data Repository Locales for the World Lisa Moore George Rhoten Mark Davis Steven Loomis."— Presentation transcript:

1 CLDR: The Common Locale Data Repository Locales for the World Lisa Moore George Rhoten Mark Davis Steven Loomis

2 Dublin, Ireland, October, 2006 2 LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting Today and the future Today and the future

3 Dublin, Ireland, October, 2006 3 LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting Today and the future Today and the future

4 Dublin, Ireland, October, 2006 4 LRC – XI The Localisation Factory Locales – does anything stay the same? "Theatre Center News: The date of the last version of this document was 2003 年 3 月 20 日. A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt."

5 Dublin, Ireland, October, 2006 5 LRC – XI The Localisation Factory Locales – the many differences Locales specify user preferences Locales specify user preferences Linguistic and cultural differences Linguistic and cultural differences Languages, scripts, writing systems, ordering, directionality, formatting, numbers, sizesLanguages, scripts, writing systems, ordering, directionality, formatting, numbers, sizes Even in the same locale, interoperability issues across platforms Even in the same locale, interoperability issues across platforms Global economics has increased the need for greater globalization support in computer systems Global economics has increased the need for greater globalization support in computer systems Everyone expects more! Everyone expects more!

6 Dublin, Ireland, October, 2006 6 LRC – XI The Localisation Factory Add the Universal Character Encoding Unicode: Unique character codes for all languages Unicode: Unique character codes for all languages …

7 Dublin, Ireland, October, 2006 7 LRC – XI The Localisation Factory The Need for Common Locale Data Computing environments often contain a variety of operating systems and software. Computing environments often contain a variety of operating systems and software. Historically locale sensitive data research has been done by individuals and/or companies. Historically locale sensitive data research has been done by individuals and/or companies. Because of political changes, it is easy for locale data to become out of date. Because of political changes, it is easy for locale data to become out of date. It is difficult to get complete agreement on correctness. It is difficult to get complete agreement on correctness.

8 Dublin, Ireland, October, 2006 8 LRC – XI The Localisation Factory Common Locale Data Project Began as Common XML Locale Repository (CXLR) developed by OpenI18N in 2003 Began as Common XML Locale Repository (CXLR) developed by OpenI18N in 2003 CLDR project began in 2004 CLDR project began in 2004 Hosted by Unicode Consortium Hosted by Unicode Consortium http://www.unicode.org/cldr/http://www.unicode.org/cldr/http://www.unicode.org/cldr/ Goals: Goals: Common, necessary software locale data for all world languagesCommon, necessary software locale data for all world languages Collect and maintain locale dataCollect and maintain locale data XML format for effective interchangeXML format for effective interchange Freely availableFreely available

9 Dublin, Ireland, October, 2006 9 LRC – XI The Localisation Factory CLDR in use (partial list) Libraries and Environments Libraries and Environments ICU – International Components for UnicodeICU – International Components for Unicode JDK – Java Development KitJDK – Java Development Kit Operating Systems Operating Systems SolarisSolaris AIXAIX MacOS XMacOS X Applications Applications OpenOffice.orgOpenOffice.org AcrobatAcrobat ModernBillModernBill

10 Dublin, Ireland, October, 2006 10 LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting The future The future

11 Dublin, Ireland, October, 2006 11 LRC – XI The Localisation Factory What is a Locale? A locale is an identifier referring to linguistic and cultural preferences A locale is an identifier referring to linguistic and cultural preferences en_US, en_GB, ja_JPen_US, en_GB, ja_JP These preferences can change over time due to cultural and political reasons These preferences can change over time due to cultural and political reasons Introduction of new currencies, like the EuroIntroduction of new currencies, like the Euro Standard sorting of Spanish changesStandard sorting of Spanish changes Many of these preferences have varying degrees of standardization Many of these preferences have varying degrees of standardization 12 and 24 hour format in the United States12 and 24 hour format in the United States This is a very broad topic This is a very broad topic

12 Dublin, Ireland, October, 2006 12 LRC – XI The Localisation Factory Types of Locale Data  Dates/time/calendar formats  Number/currency formats  Measurement system  Collation specification SortingSorting SearchingSearching MatchingMatching  Translated names for language, territory, script, timezones, currencies,…  Script and characters used by a language

13 Dublin, Ireland, October, 2006 13 LRC – XI The Localisation Factory Locale Data Markup Language  Locale data described using XML  CLDR data uses LDML  Structure of CLDR controlled by Locale Data Markup Language (LDML) specification http://unicode.org/reports/tr35 http://unicode.org/reports/tr35

14 Dublin, Ireland, October, 2006 14 LRC – XI The Localisation Factory LDML Data Categories <ldml><identity><localeDisplayNames><layout><characters><delimiters><measurement><dates><numbers><posix><collations>

15 Dublin, Ireland, October, 2006 15 LRC – XI The Localisation Factory Names <localeDisplayNames> Provides translated display names for languages, territories, scripts, variants and keywords used in CLDR. Provides translated display names for languages, territories, scripts, variants and keywords used in CLDR. Most of this information is at the language level, since it typically does not vary by territory, only language. Most of this information is at the language level, since it typically does not vary by territory, only language. An example: ICU Locale Explorer An example: ICU Locale ExplorerICU Locale ExplorerICU Locale Explorer

16 Dublin, Ireland, October, 2006 16 LRC – XI The Localisation Factory Names Examples From ga.xml (Irish): <localeDisplayNames><languages> Afar Afar Abcáisis … Abcáisis …<scripts> Araibis … Araibis …<territories> Andóra Andóra Aontas na nÉimíríochtaí Arabacha Aontas na nÉimíríochtaí Arabacha</territory>…

17 Dublin, Ireland, October, 2006 17 LRC – XI The Localisation Factory Characters <characters> Allows for creation of exemplar character sets. An exemplar set specifies the set of characters that must be present in order to properly render the language. Allows for creation of exemplar character sets. An exemplar set specifies the set of characters that must be present in order to properly render the language. Auxiliary exemplar set defines additional characters that may appear in foreign words or phrases. Auxiliary exemplar set defines additional characters that may appear in foreign words or phrases. Lower case only Lower case only

18 Dublin, Ireland, October, 2006 18 LRC – XI The Localisation Factory Date Formats <dates> Defines representation of calendars using various calendaring systems (Gregorian, Buddhist, Islamic, Japanese, etc.) Defines representation of calendars using various calendaring systems (Gregorian, Buddhist, Islamic, Japanese, etc.) Defines formatting for dates, times, eras and time zones Defines formatting for dates, times, eras and time zones wide, abbreviated, or narrowwide, abbreviated, or narrow Date and time formats use patterns of letters to define proper formattingDate and time formats use patterns of letters to define proper formatting Week information Week information Relative day/time translations (for example, yesterday, tomorrow, etc. ) Relative day/time translations (for example, yesterday, tomorrow, etc. ) An example: ICU Locale Explorer An example: ICU Locale ExplorerICU Locale ExplorerICU Locale Explorer

19 Dublin, Ireland, October, 2006 19 LRC – XI The Localisation Factory Characters / Dates Examples From ga.xml (Irish): [a á b-e é f-i í j-o ó p-u ú v-z] [a á b-e é f-i í j-o ó p-u ú v-z] [ ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ ] [ ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ ] … … Domh Domh Luan … Luan …

20 Dublin, Ireland, October, 2006 20 LRC – XI The Localisation Factory Time Zone Names <timeZoneNames> Based on Olson time zone database Based on Olson time zone database Localized display names for standard, daylight, and generic representations of time zones. Localized display names for standard, daylight, and generic representations of time zones. Short and long display names. Short and long display names.

21 Dublin, Ireland, October, 2006 21 LRC – XI The Localisation Factory Numbers <numbers> Specifies proper localized formatting of numeric quantities Specifies proper localized formatting of numeric quantities DecimalDecimal ScientificScientific CurrencyCurrency PercentagesPercentages Includes localized decimal, thousands separators, currency symbols, etc. Includes localized decimal, thousands separators, currency symbols, etc.

22 Dublin, Ireland, October, 2006 22 LRC – XI The Localisation Factory Time Zones / Currencies From ga.xml (Irish) and root.xml: <timeZoneNames> <long> Meán-Am Greenwich Meán-Am Greenwich Am Samhraidh na hÉireann Am Samhraidh na hÉireann </long>…<numbers><currencies> Euro Euro € … € …

23 Dublin, Ireland, October, 2006 23 LRC – XI The Localisation Factory Delimiters <delimiters> Specifies a primary and secondary of delimiter characters to be used for bracketing quotations in text Specifies a primary and secondary of delimiter characters to be used for bracketing quotations in text

24 Dublin, Ireland, October, 2006 24 LRC – XI The Localisation Factory Delimiters Example From fr.xml (French): <delimiters> « « » » “ “ ” ” </delimiters>

25 Dublin, Ireland, October, 2006 25 LRC – XI The Localisation Factory Collation <collations> Information in collation directory, not main Information in collation directory, not main XML version of Java/ICU collation syntax XML version of Java/ICU collation syntax Unicode collation algorithm is the base http://unicode.org/reports/tr10 Unicode collation algorithm is the base http://unicode.org/reports/tr10 http://unicode.org/reports/tr10 Allows tailoring of the UCA on a per locale basis. Allows tailoring of the UCA on a per locale basis.

26 Dublin, Ireland, October, 2006 26 LRC – XI The Localisation Factory Collation Example From collations/root.xml: <rules>... ā ā Ā Ā á á Á Á ǎ ǎ Ǎ Ǎ à à À … À …

27 Dublin, Ireland, October, 2006 27 LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting Today and the future Today and the future

28 Dublin, Ireland, October, 2006 28 LRC – XI The Localisation Factory CLDR Tools Export Export ICU resource bundle generationICU resource bundle generation POSIX locale generatorPOSIX locale generator openOffice.org format exportopenOffice.org format export Survey tool Survey tool http://www.unicode.org/cgi-bin/cldr- surveyhttp://www.unicode.org/cgi-bin/cldr- surveyhttp://www.unicode.org/cgi-bin/cldr- surveyhttp://www.unicode.org/cgi-bin/cldr- survey

29 Dublin, Ireland, October, 2006 29 LRC – XI The Localisation Factory Vetting Process for Data Collect from different platforms, experts, submissions: new or revised Collect from different platforms, experts, submissions: new or revised References to external sources strongly encouragedReferences to external sources strongly encouraged Must be before freeze date for releaseMust be before freeze date for release Use Survey Tool to Collect DataUse Survey Tool to Collect Data

30 Dublin, Ireland, October, 2006 30 LRC – XI The Localisation Factory Causes of Conflicting Data Typographical errors Typographical errors Canda instead of CanadaCanda instead of Canada Regional differences Regional differences German spelling is different between countriesGerman spelling is different between countries Parts of speech Parts of speech “март 2004” versus “3 марта” when the Russian word for March is used in a date“март 2004” versus “3 марта” when the Russian word for March is used in a date Context of usage Context of usage Normal German sorting versus German phonebook sortingNormal German sorting versus German phonebook sorting Standards versus common use Standards versus common use “Republic of Laos” versus “Laos”“Republic of Laos” versus “Laos” Individual preferences Individual preferences 24 hour time format versus 12 hour time format24 hour time format versus 12 hour time format

31 Dublin, Ireland, October, 2006 31 LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting Today and the future Today and the future

32 Dublin, Ireland, October, 2006 32 LRC – XI The Localisation Factory Latest Release: CLDR 1.4 Released: July 17, 2006 Released: July 17, 2006 360 locales: 360 locales: 121 languages121 languages 142 territories142 territories 25% more data 25% more data 17,000 new or modified data items 17,000 new or modified data items Over 100 different contributors Over 100 different contributors

33 Dublin, Ireland, October, 2006 33 LRC – XI The Localisation Factory Challenges Complex Formats Complex Formats Experts knowledgeable both in technology and a specific language Experts knowledgeable both in technology and a specific language CollationCollation Exemplar charactersExemplar characters Etc…Etc… Require close interaction of CLDR experts with language experts Require close interaction of CLDR experts with language experts

34 Dublin, Ireland, October, 2006 34 LRC – XI The Localisation Factory Getting Involved Simplest – anyone! Simplest – anyone! Use CLDRUse CLDR Bug report / feature requestBug report / feature request More Involved More Involved Vetting, Assessment, Tools, Policies, Decisions, …Vetting, Assessment, Tools, Policies, Decisions, … Any Unicode member eligible to name representatives including country liaison membersAny Unicode member eligible to name representatives including country liaison members

35 Dublin, Ireland, October, 2006 35 LRC – XI The Localisation Factory Example Country Process (Finland) Finnish Ministry of Education made CLDR data a major goal, 2004-06 Finnish Ministry of Education made CLDR data a major goal, 2004-06 Research Institute for the Languages of Finland (“RILF” aka “Kotus”) designated agencyResearch Institute for the Languages of Finland (“RILF” aka “Kotus”) designated agency Two official languages (Finnish and Swedish) & four regional / minority languages (three Sámi & Romani as spoken in Finland) to be coveredTwo official languages (Finnish and Swedish) & four regional / minority languages (three Sámi & Romani as spoken in Finland) to be covered Over 30 different parties represented: commercial, non-commercial, individualsOver 30 different parties represented: commercial, non-commercial, individuals Results expected to lead to new/revised national standardsResults expected to lead to new/revised national standards

36 Dublin, Ireland, October, 2006 36 LRC – XI The Localisation Factory For More Information Unicode Unicode http://www.unicode.org/http://www.unicode.org/http://www.unicode.org/ CLDR CLDR http://www.unicode.org/cldr/http://www.unicode.org/cldr/http://www.unicode.org/cldr/ LDML specification LDML specification http://unicode.org/reports/tr35http://unicode.org/reports/tr35http://unicode.org/reports/tr35 lisam@us.ibm.com lisam@us.ibm.com


Download ppt "CLDR: The Common Locale Data Repository Locales for the World Lisa Moore George Rhoten Mark Davis Steven Loomis."

Similar presentations


Ads by Google