CLDR: The Common Locale Data Repository Locales for the World Lisa Moore George Rhoten Mark Davis Steven Loomis.

Slides:



Advertisements
Similar presentations
WPA-WHO Global Survey of Psychiatrists' Attitudes Towards Mental Disorders Classification Results for the Spanish Society of Psychiatry.
Advertisements

2017/3/25 Test Case Upgrade from “Test Case-Training Material v1.4.ppt” of Testing basics Authors: NganVK Version: 1.4 Last Update: Dec-2005.
Requirements Engineering Processes – 2
Using Matrices in Real Life
1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 2 Getting Started.
Chapter 7 System Models.
Copyright © 2003 Pearson Education, Inc. Slide 8-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes The Web Wizards Guide to XML by Cheryl M. Hughes.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
OvidSP Flexible. Innovative. Precise. Introducing OvidSP Resources.
UNITED NATIONS Shipment Details Report – January 2006.
9 Copyright © 2005, Oracle. All rights reserved. Modularizing JavaServer Pages Development with Tags.
8 Copyright © 2005, Oracle. All rights reserved. Creating the Web Tier: JavaServer Pages.
Copyright CompSci Resources LLC Web-Based XBRL Products from CompSci Resources LLC Virginia, USA. Presentation by: Colm Ó hÁonghusa.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Introduction to HTML, XHTML, and CSS
Conversion Problems 3.3.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
1 Aberdeen City Probationer Teacher Induction Programme.
INTERNET PROTOCOLS Class 9 CSCI 6433 David C. Roberts Entire contents copyright 2011, David C. Roberts, all rights reserved.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Configuration management
1 The information industry and the information market Summary.
EU market situation for eggs and poultry Management Committee 20 October 2011.
EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.
Yong Choi School of Business CSU, Bakersfield
Dr. Lorayne Robertson, UOIT
BEEF & VEAL MARKET SITUATION "Single CMO" Management Committee 18 April 2013.
VOORBLAD.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Data Management Seminar, 8-11th July 2008, Hamburg Survey System – Overview & Changes from the Field Trial.
Adding Up In Chunks.
Universität Kaiserslautern Institut für Technologie und Arbeit / Institute of Technology and Work 1 Q16) Willingness to participate in a follow-up case.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Developing a Global Vision Through Marketing Research
Dr. Alexandra I. Cristea XHTML.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Energy Generation in Mitochondria and Chlorplasts
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
WEB OF KNOWLEDGE 5.2
23rd Internationalization and Unicode Conference, Prague, Czech Republic – March, 2003 Common XML Locale Repository Dr. Mark Davis
24rd Internationalization and Unicode Conference, Atlanta, GA USA – Sept 2003 Common XML Locale Repository Dr. Mark Davis Steven.
Language / Locale IDs M. Davis, IBM A. Phillips, webMethods.
Presentation transcript:

CLDR: The Common Locale Data Repository Locales for the World Lisa Moore George Rhoten Mark Davis Steven Loomis

Dublin, Ireland, October, LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting Today and the future Today and the future

Dublin, Ireland, October, LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting Today and the future Today and the future

Dublin, Ireland, October, LRC – XI The Localisation Factory Locales – does anything stay the same? "Theatre Center News: The date of the last version of this document was 2003 年 3 月 20 日. A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt."

Dublin, Ireland, October, LRC – XI The Localisation Factory Locales – the many differences Locales specify user preferences Locales specify user preferences Linguistic and cultural differences Linguistic and cultural differences Languages, scripts, writing systems, ordering, directionality, formatting, numbers, sizesLanguages, scripts, writing systems, ordering, directionality, formatting, numbers, sizes Even in the same locale, interoperability issues across platforms Even in the same locale, interoperability issues across platforms Global economics has increased the need for greater globalization support in computer systems Global economics has increased the need for greater globalization support in computer systems Everyone expects more! Everyone expects more!

Dublin, Ireland, October, LRC – XI The Localisation Factory Add the Universal Character Encoding Unicode: Unique character codes for all languages Unicode: Unique character codes for all languages …

Dublin, Ireland, October, LRC – XI The Localisation Factory The Need for Common Locale Data Computing environments often contain a variety of operating systems and software. Computing environments often contain a variety of operating systems and software. Historically locale sensitive data research has been done by individuals and/or companies. Historically locale sensitive data research has been done by individuals and/or companies. Because of political changes, it is easy for locale data to become out of date. Because of political changes, it is easy for locale data to become out of date. It is difficult to get complete agreement on correctness. It is difficult to get complete agreement on correctness.

Dublin, Ireland, October, LRC – XI The Localisation Factory Common Locale Data Project Began as Common XML Locale Repository (CXLR) developed by OpenI18N in 2003 Began as Common XML Locale Repository (CXLR) developed by OpenI18N in 2003 CLDR project began in 2004 CLDR project began in 2004 Hosted by Unicode Consortium Hosted by Unicode Consortium Goals: Goals: Common, necessary software locale data for all world languagesCommon, necessary software locale data for all world languages Collect and maintain locale dataCollect and maintain locale data XML format for effective interchangeXML format for effective interchange Freely availableFreely available

Dublin, Ireland, October, LRC – XI The Localisation Factory CLDR in use (partial list) Libraries and Environments Libraries and Environments ICU – International Components for UnicodeICU – International Components for Unicode JDK – Java Development KitJDK – Java Development Kit Operating Systems Operating Systems SolarisSolaris AIXAIX MacOS XMacOS X Applications Applications OpenOffice.orgOpenOffice.org AcrobatAcrobat ModernBillModernBill

Dublin, Ireland, October, LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting The future The future

Dublin, Ireland, October, LRC – XI The Localisation Factory What is a Locale? A locale is an identifier referring to linguistic and cultural preferences A locale is an identifier referring to linguistic and cultural preferences en_US, en_GB, ja_JPen_US, en_GB, ja_JP These preferences can change over time due to cultural and political reasons These preferences can change over time due to cultural and political reasons Introduction of new currencies, like the EuroIntroduction of new currencies, like the Euro Standard sorting of Spanish changesStandard sorting of Spanish changes Many of these preferences have varying degrees of standardization Many of these preferences have varying degrees of standardization 12 and 24 hour format in the United States12 and 24 hour format in the United States This is a very broad topic This is a very broad topic

Dublin, Ireland, October, LRC – XI The Localisation Factory Types of Locale Data  Dates/time/calendar formats  Number/currency formats  Measurement system  Collation specification SortingSorting SearchingSearching MatchingMatching  Translated names for language, territory, script, timezones, currencies,…  Script and characters used by a language

Dublin, Ireland, October, LRC – XI The Localisation Factory Locale Data Markup Language  Locale data described using XML  CLDR data uses LDML  Structure of CLDR controlled by Locale Data Markup Language (LDML) specification

Dublin, Ireland, October, LRC – XI The Localisation Factory LDML Data Categories <ldml><identity><localeDisplayNames><layout><characters><delimiters><measurement><dates><numbers><posix><collations>

Dublin, Ireland, October, LRC – XI The Localisation Factory Names <localeDisplayNames> Provides translated display names for languages, territories, scripts, variants and keywords used in CLDR. Provides translated display names for languages, territories, scripts, variants and keywords used in CLDR. Most of this information is at the language level, since it typically does not vary by territory, only language. Most of this information is at the language level, since it typically does not vary by territory, only language. An example: ICU Locale Explorer An example: ICU Locale ExplorerICU Locale ExplorerICU Locale Explorer

Dublin, Ireland, October, LRC – XI The Localisation Factory Names Examples From ga.xml (Irish): <localeDisplayNames><languages> Afar Afar Abcáisis … Abcáisis …<scripts> Araibis … Araibis …<territories> Andóra Andóra Aontas na nÉimíríochtaí Arabacha Aontas na nÉimíríochtaí Arabacha</territory>…

Dublin, Ireland, October, LRC – XI The Localisation Factory Characters <characters> Allows for creation of exemplar character sets. An exemplar set specifies the set of characters that must be present in order to properly render the language. Allows for creation of exemplar character sets. An exemplar set specifies the set of characters that must be present in order to properly render the language. Auxiliary exemplar set defines additional characters that may appear in foreign words or phrases. Auxiliary exemplar set defines additional characters that may appear in foreign words or phrases. Lower case only Lower case only

Dublin, Ireland, October, LRC – XI The Localisation Factory Date Formats <dates> Defines representation of calendars using various calendaring systems (Gregorian, Buddhist, Islamic, Japanese, etc.) Defines representation of calendars using various calendaring systems (Gregorian, Buddhist, Islamic, Japanese, etc.) Defines formatting for dates, times, eras and time zones Defines formatting for dates, times, eras and time zones wide, abbreviated, or narrowwide, abbreviated, or narrow Date and time formats use patterns of letters to define proper formattingDate and time formats use patterns of letters to define proper formatting Week information Week information Relative day/time translations (for example, yesterday, tomorrow, etc. ) Relative day/time translations (for example, yesterday, tomorrow, etc. ) An example: ICU Locale Explorer An example: ICU Locale ExplorerICU Locale ExplorerICU Locale Explorer

Dublin, Ireland, October, LRC – XI The Localisation Factory Characters / Dates Examples From ga.xml (Irish): [a á b-e é f-i í j-o ó p-u ú v-z] [a á b-e é f-i í j-o ó p-u ú v-z] [ ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ ] [ ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ ] … … Domh Domh Luan … Luan …

Dublin, Ireland, October, LRC – XI The Localisation Factory Time Zone Names <timeZoneNames> Based on Olson time zone database Based on Olson time zone database Localized display names for standard, daylight, and generic representations of time zones. Localized display names for standard, daylight, and generic representations of time zones. Short and long display names. Short and long display names.

Dublin, Ireland, October, LRC – XI The Localisation Factory Numbers <numbers> Specifies proper localized formatting of numeric quantities Specifies proper localized formatting of numeric quantities DecimalDecimal ScientificScientific CurrencyCurrency PercentagesPercentages Includes localized decimal, thousands separators, currency symbols, etc. Includes localized decimal, thousands separators, currency symbols, etc.

Dublin, Ireland, October, LRC – XI The Localisation Factory Time Zones / Currencies From ga.xml (Irish) and root.xml: <timeZoneNames> <long> Meán-Am Greenwich Meán-Am Greenwich Am Samhraidh na hÉireann Am Samhraidh na hÉireann </long>…<numbers><currencies> Euro Euro € … € …

Dublin, Ireland, October, LRC – XI The Localisation Factory Delimiters <delimiters> Specifies a primary and secondary of delimiter characters to be used for bracketing quotations in text Specifies a primary and secondary of delimiter characters to be used for bracketing quotations in text

Dublin, Ireland, October, LRC – XI The Localisation Factory Delimiters Example From fr.xml (French): <delimiters> « « » » “ “ ” ” </delimiters>

Dublin, Ireland, October, LRC – XI The Localisation Factory Collation <collations> Information in collation directory, not main Information in collation directory, not main XML version of Java/ICU collation syntax XML version of Java/ICU collation syntax Unicode collation algorithm is the base Unicode collation algorithm is the base Allows tailoring of the UCA on a per locale basis. Allows tailoring of the UCA on a per locale basis.

Dublin, Ireland, October, LRC – XI The Localisation Factory Collation Example From collations/root.xml: <rules>... ā ā Ā Ā á á Á Á ǎ ǎ Ǎ Ǎ à à À … À …

Dublin, Ireland, October, LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting Today and the future Today and the future

Dublin, Ireland, October, LRC – XI The Localisation Factory CLDR Tools Export Export ICU resource bundle generationICU resource bundle generation POSIX locale generatorPOSIX locale generator openOffice.org format exportopenOffice.org format export Survey tool Survey tool surveyhttp:// surveyhttp:// surveyhttp:// survey

Dublin, Ireland, October, LRC – XI The Localisation Factory Vetting Process for Data Collect from different platforms, experts, submissions: new or revised Collect from different platforms, experts, submissions: new or revised References to external sources strongly encouragedReferences to external sources strongly encouraged Must be before freeze date for releaseMust be before freeze date for release Use Survey Tool to Collect DataUse Survey Tool to Collect Data

Dublin, Ireland, October, LRC – XI The Localisation Factory Causes of Conflicting Data Typographical errors Typographical errors Canda instead of CanadaCanda instead of Canada Regional differences Regional differences German spelling is different between countriesGerman spelling is different between countries Parts of speech Parts of speech “март 2004” versus “3 марта” when the Russian word for March is used in a date“март 2004” versus “3 марта” when the Russian word for March is used in a date Context of usage Context of usage Normal German sorting versus German phonebook sortingNormal German sorting versus German phonebook sorting Standards versus common use Standards versus common use “Republic of Laos” versus “Laos”“Republic of Laos” versus “Laos” Individual preferences Individual preferences 24 hour time format versus 12 hour time format24 hour time format versus 12 hour time format

Dublin, Ireland, October, LRC – XI The Localisation Factory Agenda Why CLDR? Why CLDR? CLDR data CLDR data Tools and vetting Tools and vetting Today and the future Today and the future

Dublin, Ireland, October, LRC – XI The Localisation Factory Latest Release: CLDR 1.4 Released: July 17, 2006 Released: July 17, locales: 360 locales: 121 languages121 languages 142 territories142 territories 25% more data 25% more data 17,000 new or modified data items 17,000 new or modified data items Over 100 different contributors Over 100 different contributors

Dublin, Ireland, October, LRC – XI The Localisation Factory Challenges Complex Formats Complex Formats Experts knowledgeable both in technology and a specific language Experts knowledgeable both in technology and a specific language CollationCollation Exemplar charactersExemplar characters Etc…Etc… Require close interaction of CLDR experts with language experts Require close interaction of CLDR experts with language experts

Dublin, Ireland, October, LRC – XI The Localisation Factory Getting Involved Simplest – anyone! Simplest – anyone! Use CLDRUse CLDR Bug report / feature requestBug report / feature request More Involved More Involved Vetting, Assessment, Tools, Policies, Decisions, …Vetting, Assessment, Tools, Policies, Decisions, … Any Unicode member eligible to name representatives including country liaison membersAny Unicode member eligible to name representatives including country liaison members

Dublin, Ireland, October, LRC – XI The Localisation Factory Example Country Process (Finland) Finnish Ministry of Education made CLDR data a major goal, Finnish Ministry of Education made CLDR data a major goal, Research Institute for the Languages of Finland (“RILF” aka “Kotus”) designated agencyResearch Institute for the Languages of Finland (“RILF” aka “Kotus”) designated agency Two official languages (Finnish and Swedish) & four regional / minority languages (three Sámi & Romani as spoken in Finland) to be coveredTwo official languages (Finnish and Swedish) & four regional / minority languages (three Sámi & Romani as spoken in Finland) to be covered Over 30 different parties represented: commercial, non-commercial, individualsOver 30 different parties represented: commercial, non-commercial, individuals Results expected to lead to new/revised national standardsResults expected to lead to new/revised national standards

Dublin, Ireland, October, LRC – XI The Localisation Factory For More Information Unicode Unicode CLDR CLDR LDML specification LDML specification