Presentation is loading. Please wait.

Presentation is loading. Please wait.

International Summit on Localisation (MAIT/TDIL) New Delhi, 2004-12-08 (R2) Localization Data Mark Davis, PhD Chief SW Globalization Arch., IBM President,

Similar presentations


Presentation on theme: "International Summit on Localisation (MAIT/TDIL) New Delhi, 2004-12-08 (R2) Localization Data Mark Davis, PhD Chief SW Globalization Arch., IBM President,"— Presentation transcript:

1 International Summit on Localisation (MAIT/TDIL) New Delhi, 2004-12-08 (R2) Localization Data Mark Davis, PhD Chief SW Globalization Arch., IBM President, Unicode Consortium

2 Importance of Standards Products developed in each country interoperate with other products: inside and outside that country Products developed in each country interoperate with other products: inside and outside that country Mechanism for countries / industries to promulgate best practices Mechanism for countries / industries to promulgate best practices SW Localization SW Localization Unicode: Universal character encodingUnicode: Universal character encoding CLDR: Common Locale Data RepositoryCLDR: Common Locale Data Repository

3 Universal Character Encoding Unicode: Unique character codes for all languages Unicode: Unique character codes for all languages …

4 Common Locale Data Repository Relatively new project: 2004 Relatively new project: 2004 Hosted by Unicode Consortium Hosted by Unicode Consortium http://www.unicode.org/cldr/http://www.unicode.org/cldr/http://www.unicode.org/cldr/ Goals: Goals: Common, required SW locale data for world languagesCommon, required SW locale data for world languages XML format for effective interchangeXML format for effective interchange Freely availableFreely available

5 What is Locale Data Locale = identifier string referring to linguistic and cultural preferences Locale = identifier string referring to linguistic and cultural preferences Typical data Typical data Dates/time formatsDates/time formats Number/Currency formatsNumber/Currency formats Measurement SystemMeasurement System Collation Specification (Collation)Collation Specification (Collation) Used for sorting, searching, matching Used for sorting, searching, matching Translated names for language, territory, script, timezones, currencies,…Translated names for language, territory, script, timezones, currencies,…

6 Latest Release: CLDR 1.2 Released:November, 2004 Released:November, 2004 localeslanguagesterritories Approved:23272 108 Approved:23272 108 Draft:632728 Draft:632728 Data Data Unique XPaths:2,540Unique XPaths:2,540 Actual Values:56,290Actual Values:56,290 Fully Resolved:358,860Fully Resolved:358,860 (not including collation, aliased data)

7 Next Release: CLDR 1.3 Jan 2005: Freeze date Jan 2005: Freeze date For new enhancement requests & bug reportsFor new enhancement requests & bug reports Apr 2005: Target release date Apr 2005: Target release date Planned features Planned features New data / corrections / tests (ongoing)New data / corrections / tests (ongoing) Survey toolSurvey tool POSIX conversion toolPOSIX conversion tool Additional MechanismsAdditional Mechanisms lenient date/time/number parsing; lenient date/time/number parsing; different combinations of date fields; different combinations of date fields; names for dialects, measurement systems; names for dialects, measurement systems; narrative reference information narrative reference information

8 Usage (direct or indirect) Caveats Caveats Not a complete list: usage is not tracked, so this is an estimateNot a complete list: usage is not tracked, so this is an estimate CLDR first available in 2004, so may use precursor dataCLDR first available in 2004, so may use precursor data Companies / Organizations Companies / Organizations Adobe, Apple (Mac OS X), abas Software, Argonne National Laboratory, Ascential Software, Avaya, BEA, BroadJump, BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, CERN, Cognos, Debian Linux, Gentoo Linux, HP, Hyperion, IBM, Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS, JD Edwards, Jikes, Macromedia, Mathworks, Mozilla, OpenOffice, Language Analysis Systems, Lawson Software, Leica Geosystems GIS & Mapping LLC, Mandrake Linux, Parrot, PayPal, Progress Software, Python, QNX, Rogue Wave, SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), SuSE, Sybase, Teradata (NCR), Trend Micro, Virage, webMethods, Wine, WMS Gaming,…Adobe, Apple (Mac OS X), abas Software, Argonne National Laboratory, Ascential Software, Avaya, BEA, BroadJump, BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, CERN, Cognos, Debian Linux, Gentoo Linux, HP, Hyperion, IBM, Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS, JD Edwards, Jikes, Macromedia, Mathworks, Mozilla, OpenOffice, Language Analysis Systems, Lawson Software, Leica Geosystems GIS & Mapping LLC, Mandrake Linux, Parrot, PayPal, Progress Software, Python, QNX, Rogue Wave, SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), SuSE, Sybase, Teradata (NCR), Trend Micro, Virage, webMethods, Wine, WMS Gaming,… Optional use: Optional use: Apache, Perl, Xalan, Xerces, …Apache, Perl, Xalan, Xerces, …

9 Sample: Languages, Scripts, Territories <localeDisplayNames><languages> Afar Afar Abkhasisk … Abkhasisk …<scripts> Arabisk … Arabisk …<territories> Andorra Andorra Forenede Arabiske Emirater Forenede Arabiske Emirater</territory>…

10 Sample: Characters / Dates <characters> [a-z æ å ø á é í ó ú ý] [a-z æ å ø á é í ó ú ý] </characters>… søn søn man … man …

11 Sample: Timezones / Currencies <timeZoneNames> <long> Pacific-normaltid Pacific-normaltid Pacific-sommertid Pacific-sommertid </long>…<currencies> Gabonesisk CFA-franc Gabonesisk CFA-franc</displayName> GAF … GAF …

12 Sample: Collation <rules> 0 0

13 Committee Process For most effective participation from people around the world For most effective participation from people around the world MeetingsMeetings By phone, never F2F By phone, never F2F Short, often Short, often Allows preparation between meetings Allows preparation between meetings WrittenWritten Email Email Database submissions Database submissions

14 Vetting Process for Data Collect from different platforms, experts, submissions: new or revised Collect from different platforms, experts, submissions: new or revised References to external sources strongly encouragedReferences to external sources strongly encouraged Must be before freeze date for releaseMust be before freeze date for release Will use Survey ToolWill use Survey Tool Enter in the repository Enter in the repository Mark with draft attributeMark with draft attribute Add references, standardsAdd references, standards Verify by CLDR committee members Verify by CLDR committee members Consulting with country contactsConsulting with country contacts If disagreement, decide in committeeIf disagreement, decide in committee Accept Accept As main form: draft attribute removedAs main form: draft attribute removed As alternate form: marked with different attributesAs alternate form: marked with different attributes

15 Challenges Aggressive, 6 month release schedule Aggressive, 6 month release schedule Complex Formats Complex Formats Collation, Date Formats, Exemplar characters, etc.Collation, Date Formats, Exemplar characters, etc. Require close interaction of CLDR experts with language expertsRequire close interaction of CLDR experts with language experts Choosing most customary, acceptable forms Choosing most customary, acceptable forms Regional differences, individual preferencesRegional differences, individual preferences Context (months in formats vs. calendars)Context (months in formats vs. calendars) Uncommon cases (Interlingua)Uncommon cases (Interlingua) Standards vs. common modern usageStandards vs. common modern usage Obtaining references for dataObtaining references for data But can have multiple, alternate versionsBut can have multiple, alternate versions

16 Getting Involved Simplest Simplest Bug report / feature request – anyone!Bug report / feature request – anyone! More Involved More Involved Vetting, Assessment, Tools, Policies, Decisions, …Vetting, Assessment, Tools, Policies, Decisions, … Any Unicode member eligible to name representativesAny Unicode member eligible to name representatives Full members: IBM, Apple, Sun, Oracle, India,… Full members: IBM, Apple, Sun, Oracle, India,… Liaison members: Ireland, Finland, … Liaison members: Ireland, Finland, … Associate members: Tamil Nadu, … Associate members: Tamil Nadu, …

17 Example Country Process (Finland) Finnish Ministry of Education made CLDR data a major goal, 2004-06 Finnish Ministry of Education made CLDR data a major goal, 2004-06 Research Institute for the Languages of Finland ("RILF" aka "Kotus") designated agencyResearch Institute for the Languages of Finland ("RILF" aka "Kotus") designated agency Documenting the national preferences in the open even more important than implementationsDocumenting the national preferences in the open even more important than implementations Results expected to lead to new/revised national standardsResults expected to lead to new/revised national standards

18 Example Country Process (II) RILF a Unicode Liaison member, 2004-07 RILF a Unicode Liaison member, 2004-07 Set up fully open national group on language and cultural requirements on ICT, 2004-09Set up fully open national group on language and cultural requirements on ICT, 2004-09 Two official languages (Finnish and Swedish) & four regional / minority languages (three Sámi & Romani as spoken in Finland) to be coveredTwo official languages (Finnish and Swedish) & four regional / minority languages (three Sámi & Romani as spoken in Finland) to be covered Over 30 different parties represented: commercial, non-commercial, individualsOver 30 different parties represented: commercial, non-commercial, individuals Public comments to be allowed: http://kotoistus.fiPublic comments to be allowed: http://kotoistus.fi http://kotoistus.fi Documentation for all controversial issues and deviations from any national standardsDocumentation for all controversial issues and deviations from any national standards

19 For more information Unicode Unicode http://www.unicode.org/http://www.unicode.org/http://www.unicode.org/ CLDR CLDR http://www.unicode.org/cldr/http://www.unicode.org/cldr/http://www.unicode.org/cldr/ This presentation This presentation http://www.macchiato.com/slides/Locali zation.ppthttp://www.macchiato.com/slides/Locali zation.ppthttp://www.macchiato.com/slides/Locali zation.ppthttp://www.macchiato.com/slides/Locali zation.ppt


Download ppt "International Summit on Localisation (MAIT/TDIL) New Delhi, 2004-12-08 (R2) Localization Data Mark Davis, PhD Chief SW Globalization Arch., IBM President,"

Similar presentations


Ads by Google