Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language / Locale IDs M. Davis, IBM A. Phillips, webMethods.

Similar presentations


Presentation on theme: "Language / Locale IDs M. Davis, IBM A. Phillips, webMethods."— Presentation transcript:

1 Language / Locale IDs M. Davis, IBM A. Phillips, webMethods

2 Language "A shprakh iz a diyalekt mit an armey un a flot" Max Weinreich (Joshua Fishman), 1945. the written form is the most important for computers the written form is the most important for computers does include “culturally-specific” formatting (as we’ll see later) does include “culturally-specific” formatting (as we’ll see later) does not include currency, time-zone, seat-assignment, etc. does not include currency, time-zone, seat-assignment, etc.

3 Language Tags: Two Needs Identification Identification Announce that this text is American, Northern Californian, Casual, PG-13 English Announce that this text is American, Northern Californian, Casual, PG-13 English Filtering/Matching Filtering/Matching Accept Any English, Any French, Swiss German,… Accept Any English, Any French, Swiss German,…

4 Background RFC 1766 RFC 1766 RFC 3066 RFC 3066 Used in XML, HTML,… Used in XML, HTML,… Used both as language ID and locale ID (narrow sense) Used both as language ID and locale ID (narrow sense)

5 RFC 3066bis Successor to 3066 Successor to 3066 For use in XML, HTML, Java, … For use in XML, HTML, Java, … Addresses limitations of 3066 Addresses limitations of 3066 First Draft: 2003/10 First Draft: 2003/10 Latest Draft: 2004/2 Latest Draft: 2004/2 http://www.ietf.org/internet-drafts/draft- phillips-langtags-01.txt http://www.ietf.org/internet-drafts/draft- phillips-langtags-01.txt http://www.ietf.org/internet-drafts/draft- phillips-langtags-01.txt http://www.ietf.org/internet-drafts/draft- phillips-langtags-01.txt Final Draft: 2004/5?? Final Draft: 2004/5??

6 Main Goals Maintain backward compatibility (so that all previous codes would remain valid) Maintain backward compatibility (so that all previous codes would remain valid) Reduce the need for large numbers of registrations Reduce the need for large numbers of registrations Provide a more formal structure to allow parsing into subtags even where software does not have the latest registrations Provide a more formal structure to allow parsing into subtags even where software does not have the latest registrations Provide stability in the face of potential instability in ISO 639, 3166, and 15924 codes (demonstrated instability in the case of ISO 3166) Provide stability in the face of potential instability in ISO 639, 3166, and 15924 codes (demonstrated instability in the case of ISO 3166) Allow for external extension mechanisms. Allow for external extension mechanisms.

7 Expressiveness Allows ISO15924 script code subtags and allows them to be used generatively. Allows ISO15924 script code subtags and allows them to be used generatively. Adds the concept of a variant subtag and allows variants to be used generatively. Adds the concept of a variant subtag and allows variants to be used generatively. Allows use of UN M49 codes: Allows use of UN M49 codes: es-419 = ”Spanish, Latin America” es-419 = ”Spanish, Latin America” Changes the IANA language tag registry to a language subtag registry Changes the IANA language tag registry to a language subtag registry

8 Stability Allows backward/forward compatible parsing Allows backward/forward compatible parsing Defines a process for handling reuse of values by ISO639, ISO15924, and ISO3166 in the event that they register a previously used value for a new purpose. Defines a process for handling reuse of values by ISO639, ISO15924, and ISO3166 in the event that they register a previously used value for a new purpose.

9 Private Use & Extensions Adds an extension mechanism which does not require registration to use. Adds an extension mechanism which does not require registration to use. Defines the private use tags in ISO639, ISO15924, and ISO3166 as the mechanism for creating private use language, script, and region subtags respectively Defines the private use tags in ISO639, ISO15924, and ISO3166 as the mechanism for creating private use language, script, and region subtags respectively Defines a syntax for private use variant subtags which can be used without registration. Defines a syntax for private use variant subtags which can be used without registration.

10 Structure (Bizarro BNF) tag= lang *["-s-" extlang] ["-" script] ["-" region] *["-" variant] ["-x" extensions] =/ "x" extensions ; private use =/ "x" extensions ; private use =/ grandfathered-registrations =/ grandfathered-registrations lang = 2*3 ALPHA ; shortest ISO 639 =/ registered-lang =/ registered-lang registered-lang = 5*15 alphanum

11 Structure II script = 4 ALPHA ; ISO 15924 region = 2 ALPHA ; ISO 3166 =/ 3 DIGIT ; UN country # =/ 3 DIGIT ; UN country # variant = 5*15 alphanum extensions = 1* ("-" value) value = 1*31 alphanum

12 Examples I Simple language code: Simple language code: de (German) de (German) fr (French) fr (French) ja (Japanese) ja (Japanese) Language code plus Script code : Language code plus Script code : zh-Hant (Traditional Chinese) zh-Hant (Traditional Chinese) en-Latn (English written in Latin script) en-Latn (English written in Latin script) sr-Cyrl (Serbian written with Cyrillic script) sr-Cyrl (Serbian written with Cyrillic script) Language-Region: Language-Region: de-DE (German for Germany) de-DE (German for Germany) zh-SG (Chinese for Singapore) zh-SG (Chinese for Singapore) cs-CS (Czech for Czechoslovakia) cs-CS (Czech for Czechoslovakia) sr-891 (Serbian for Serbia and Montenegro) sr-891 (Serbian for Serbia and Montenegro)

13 Examples II Language-Script-Region: Language-Script-Region: zh-Hans-CN (Simplified Chinese for the PRC) zh-Hans-CN (Simplified Chinese for the PRC) sr-Latn-891 (Serbian, Latin script, Serbia & Monte.) sr-Latn-891 (Serbian, Latin script, Serbia & Monte.) Language-Script-Region-Variant: Language-Script-Region-Variant: en-Latn-US-boont (Boontling dialect of English) en-Latn-US-boont (Boontling dialect of English) Other Mixtures: Other Mixtures: zh-CN (Chinese for the PRC) zh-CN (Chinese for the PRC) en-boont (Boontling dialect of English) en-boont (Boontling dialect of English)

14 Examples III Extension mechanism: Extension mechanism: x-valley-girl x-valley-girl de-CH-x-phonebook de-CH-x-phonebook az-Arab-x-AZE-derbend az-Arab-x-AZE-derbend Extended language subtags: Extended language subtags: zh-s-min zh-s-min zh-s-min-s-nan-Hant-CN zh-s-min-s-nan-Hant-CN Private Use tags: Private Use tags: qaa-Qaaa-QM-xsouthern (all private tags) qaa-Qaaa-QM-xsouthern (all private tags) de-Qaaa (German, with a private script) de-Qaaa (German, with a private script) de-Latn-QM (German, Latin-script, private region) de-Latn-QM (German, Latin-script, private region) de-Qaaa-DE (German, private script, for Germany) de-Qaaa-DE (German, private script, for Germany)

15 Examples IV Some Invalid Tags: Some Invalid Tags: de-891-DE (two region tags) de-891-DE (two region tags) a-DE (use of a single character tag) a-DE (use of a single character tag) zh-xsouthern-DE (private use variant followed by another tag) zh-xsouthern-DE (private use variant followed by another tag)

16 Locale different interpretations different interpretations narrow = language narrow = language broad = any user-preferences broad = any user-preferences user preferences language

17 Language vs Locale

18 Which are English? "Theatre Center News: The date of the last version of this document was 2003 年 3 月 20 日. A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt." "Theatre Center News: The date of the last version of this document was 2003 年 3 月 20 日. A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt." "Theater Center News: The date of the last version of this document was 3/20/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader." "Theater Center News: The date of the last version of this document was 3/20/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader." "Theatre Centre News: The date of the last version of this document was 20/3/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader." "Theatre Centre News: The date of the last version of this document was 20/3/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."

19 Summary Improved version of 3066 Improved version of 3066 Used for language and locale (in narrow sense) Used for language and locale (in narrow sense) Addresses Issues Addresses Issues Script Distinctions Script Distinctions Parseability Parseability Extensions Extensions …

20 References Latest Public Draft Latest Public Draft http://www.ietf.org/internet-drafts/draft-phillips- langtags-01.txt http://www.ietf.org/internet-drafts/draft-phillips- langtags-01.txt http://www.ietf.org/internet-drafts/draft-phillips- langtags-01.txt http://www.ietf.org/internet-drafts/draft-phillips- langtags-01.txt Working Draft Working Draft http://www.inter-locale.com/ID/draft-phillips- langtags-02.html (HTML version) http://www.inter-locale.com/ID/draft-phillips- langtags-02.html (HTML version) http://www.inter-locale.com/ID/draft-phillips- langtags-02.html http://www.inter-locale.com/ID/draft-phillips- langtags-02.html Language Code Issues (+ Locales) Language Code Issues (+ Locales) http://oss.software.ibm.com/cvs/icu/~checkout~/icuh tml/design/language_code_issues.html http://oss.software.ibm.com/cvs/icu/~checkout~/icuh tml/design/language_code_issues.html http://oss.software.ibm.com/cvs/icu/~checkout~/icuh tml/design/language_code_issues.html http://oss.software.ibm.com/cvs/icu/~checkout~/icuh tml/design/language_code_issues.html

21 Q&A


Download ppt "Language / Locale IDs M. Davis, IBM A. Phillips, webMethods."

Similar presentations


Ads by Google