Language Tags and Locale Identifiers A Status Report.

Slides:



Advertisements
Similar presentations
Dublin Core in Multiple Languages Thomas Baker Sixth Dublin Core Workshop Library of Congress, Washington DC Tuesday, 3 November 1998.
Advertisements

Language Tags W3C Project Review. Presenter and Agenda Addison Phillips Internationalization Architect, Yahoo! Co-Editor, Language Tag Registry Update.
Web 2.0 Programming 1 © Tongji University, Computer Science and Technology. Web Web Programming Technology 2012.
How Standards Happen* *and why sometimes they dont Addison Phillips Internationalization Architect Yahoo! Inc.
Making Sense of Language Tags 10 th Metadata Open Forum.
Whats New in Globalization? Mark Davis President & Cofounder The Unicode Consortium.
Whats New in Globalization Mark Davis. Unicode Character Database: UCD 5.0 Schedule Currently in β2 Due June, 2006 Major part of the Unicode Standard.
Behzad Samin 0 An End-to-End Overview of a RESTful Web Service.
Copyright © 2003 Pearson Education, Inc. Slide 4-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Internationalizing WHOIS Preliminary Approaches for Discussion Internationalized Registration Data Working Group ICANN Meeting, Brussels, Belgium Jeremy.
ICANN Rio Meeting IDN Authorization for TLDs with ICANN agreements 26 March, 2003 Andrew McLaughlin.
LIS650lecture 1 XHTML 1.0 strict Thomas Krichel
Httpbis IETF 721 RFC2616bis Draft Overview IETF 72, Dublin Julian Reschke Mailing List: Jabber:
XPointer and HTTP Range A possible design for a scalable and extensible RDF Data Access protocol. Bryan Thompson Presented to the RDF Data Access.
XPointer and HTTP Range A possible design for a scalable and extensible RDF Data Access protocol. Bryan Thompson draft Presented to the RDF.
From UCS-2 to UTF-16 Discussion and practical example for the transition of a Unicode library from UCS-2 to UTF-16.
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
Keys to Building a Multilingual Search Engine Thierry Sourbier.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Internationalization of HTML client-server applications Andrea Vine iPlanet Internationalization Architect.
UKOLN, University of Bath
4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I
Overview Environment for Internet database connectivity
Building International Applications with Visual Studio.NET Achim Ruopp International Program Manager Microsoft Corporation.
Internationalization Status and Directions: IETF, JET, and ICANN John C Klensin October 2002 © 2002 John C Klensin.
XML Craig Stewart Dr. Alexandra I. Cristea
Hypermedia systems Jakub Husár & Tomáš Jurík. Content XHTML 2.0 Definition Short history Differences between 1.0 and 2.0 Usage suitability Improvements.
Information Management NTU Web Services. Information Management NTU What Are Web Services? Semantically encapsulate discrete functionality Loosely coupled,
Web Services Nasrullah. Motivation about web service There are number of programms over the internet that need to communicate with other programms over.
XML 6.2 XSL / XSLT 6. What is XSL? XSL stands for eXtensible Stylesheet Language CSS was designed for styling HTML pages, and can be used to style XML.
XSL XSLT and XPath 11-Apr-17.
Information Retrieval in Practice
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation.
Introduction to Chinese Domain Name ZHANG Hong Aug 24, 2003.
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
1 © 2000, Cisco Systems, Inc. DNSSEC IDN Patrik Fältström
ECA 228 Internet/Intranet Design I Meta Tags & Directories.
IDN over EPP (IDNPROV) IETF BOF, Washington DC November 2004.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Unicode & W3C Jataayu Software C. Kumar January 2007.
CcTLD IDN TF Report ccTLD Meeting, Rio de Janero Mar. 25, 2003 Young-Eum Chair, ccTLD IDN TF.
Chapter 6 Text and Multimedia Languages and Properties
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Language / Locale IDs M. Davis, IBM A. Phillips, webMethods.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
Issues in IDN APTLD Meeting in Taipai Feb. 24, 2003 Young-Eum Lee.
ccTLD IDN Report ccTLD Meeting, Montreol June 24, 2003 Young-Eum
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 14 Globalization Support in the Database.
4395bis irireg Tony Hansen, Larry Masinter, Ted Hardie IETF 82, Nov 16, 2011.
Copenhagen, 6 June 2006 EC CHM Multilinguality Anton Cupcea Finsiel Romania.
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
Week 7 Lecture 2 Globalization Support in the Database.
IDNAbis and Security Protocols or Internationalization Issues with Short Strings John C Klensin SAAG – 26 July 2007.
RADEXT WG RADIUS Attribute Guidelines Greg Weber March 21 st, 2006 IETF-65, Dallas v1 draft-weber-radius-attr-guidelines-02.txt draft-wolff-radext-ext-attribute-00.txt.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
Characters CS240.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Chapter 13 A & B Programming Languages and the.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
ADDRESS INTERNATIONALIZATION ( EAI ) ICANN-55 Mar 06, 2016 TF-AIDN Member 35+ Min : 10- Min ( Q & A )
A S P. Outline  The introduction of ASP  Why we choose ASP  How ASP works  Basic syntax rule of ASP  ASP’S object model  Limitations of ASP  Summary.
EU Inter-Community Meetup Geneva, Saturday, 13 June, 2009.
Information Retrieval in Practice
Yaşar Tonta & Orçun Madran [yasartonta, Hacettepe University
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016)
Old Dominion University Department of Computer Science
Requirements for IDN and its Implementations from Japan
Requirements for IDN and its Implementations from Japan
Presentation transcript:

Language Tags and Locale Identifiers A Status Report

Presenter and Agenda Addison Phillips Internationalization Architect, Yahoo! Co-Editor, Language Tag Registry Update (LTRU) Working Group (RFC 3066bis, draft-matching) Language tags Locale identifiers

Languages? Locales? Whats a language tag? What the is a locale? Why do identifiers matter?

Language Tags Enable presentation, selection, and negotiation of content Defined by BCP 47 – Widely used! XML, HTML, RSS, MIME, SOAP, SMTP, LDAP, CSS, XSL, CCXML, Java, C#, ASP, perl………. – Well understood (?)

Locale Identifiers Different ideas: – Accept-Locale vs. Accept-Language – URIs/URNs, etc. – CLDR/LDML And Requirements: – Operating environments and harmonization – App Servers – Web Services New Solution? Cost of Adoption: – UTF-8 to the browser: 8 long years

In the Beginning Received Wisdom from the Dark Ages Locales: – japanese, french, german, C – ENU, FRA, JPN – ja_JP.PCK – AMERICAN_AMERICA.WE8ISO8859P1 Languages… … looked a lot like locales (and vice versa)

Locales and Language Tags meet Conversations in Prague… – Language tags are being locale identifiers anyway… – Not going to need a big new thing… – Just a few things to fix… … we can do this really fast

BCP 47 Basic Structure Alphanumeric (ASCII only) subtags Up to eight characters long Separated by hyphens Case not important (i.e. zh = ZH = zH = Zh) 1*8alphanum * [ - 1*8 alphanum ]

RFC 1766 zh-TW ISO (alpha2) ISO 3166 (alpha2) i-klingon Registered value

RFC 3066 sco-GB ISO (alpha 3 codes) But use… eng-GB alpha 2 codes when they exist X

Problems Script Variation: – zh-Hant/zh-Hans – (sr-Cyrl/sr-Latn, az-Arab/az-Latn/az-Cyrl, etc.) Obsolence of registrations: – art-lojban (now jbo), i-klingon (now tlh) Instability in underlying standards: – sr-CS (CS used to be Czechoslovakia …

And More Problems Lack of scripts Little support for registered values in software Reassignment of values by ISO 3166 Lack of consistent tag formation (Chinese dialects?) Standards not readily available, bad references Bad implementation assumptions – 1*8 alphanum *[ - 1*8 alphanum] – 2*3 ALPHA [ - 2ALPHA ] Many registrations to cover small variations – 8 German registrations to cover two variations

LTRU and draft-registry Defines a generative syntax – machine readable – future proof, extensible Defines a single source – Stable subtags, no conflicts – Machine readable Defines when to use subtags – (sometimes)

RFC 3066bis and LTRU sl-Latn-IT-rozaj-x-mine ISO 639-1/2 (alpha2/3)ISO script codes (alpha 4)ISO 3166 (alpha2) or UN M49Registered variants (any number) Private Use and Extension

More Examples es-419 (Spanish for Americas) en-US (English for USA) de-CH-1996 (Old tags are all valid) sl-rozaj-nedis (Multiple variants) zh-t-wadegile (Extensions)

Benefits Subtag registry in one place: one source. Subtags identified by length/content Extensible Compatible with RFC 3066 tags Stable: subtags are forever

Problems Matching – Does en-US match en-Latn-US ? Tag Choices – Users have more to choose from. Implementations – More to do, more to think about – (easier to parse, process, support the good stuff)

Tag Matching Uses Language Ranges in a Language Priority List to select sets of content according to the language tag Four Schemes – Basic Filtering – Extended Filtering – Scored Filtering – Lookup

Filtering Ranges specify the least specific item – en matches en, en-US, en-Brai, en-boont Basic matching uses plain prefixes Extended matching can match inside bits – en-*-US

Scored Filtering Assigns a weight or score to each match Result set is ordered by match quality Postulated by John Cowan

Lookup Range specifies the most specific tag in a match. – en-US matches en and en-US but not en- US-boont Mirrors the locale fallback mechanism and many language negotiation schemes.

What Do I Do (Content Author)? Not much. – Existing tags are all still valid: tagging is mostly unchanged. – Resist temptation to (ab)use the private use subtags. Unless your language has script variations: – Tag content with the appropriate script subtag(s) Script subtags only apply to a small number of languages: zh, sr, uz, az, mn, and a very small number of others.

What Do I Do (Programmer)? Check code for compliance with 3066bis – Decide on well-formed or validating – Implement suppress-script – Change to using the registry – Bother infrastructure folks (Java, MS, Mozilla, etc) to implement the standard

What Do I Do (End-User)? Check and update your language ranges. Tag content wisely.

LTRU Milestone Dates (Done) RFC 3066bis – Registry went live in December 2005 Produce Matching RFC – Draft-11 available (WG Last Call started … Monday) (Anticipated) Produce RFC 3066ter – This includes ISO support, extended language subtags, and possibly ISO 639-6

Things to Read Registry Draft registry-12.txt Matching Draft LTRU Mailing List

Things to Do (languages) Get involved in LTRU Get involved in W3C I18N Core WG! Write implementations Work on adoption of 3066bis: understand the impact Then get involved with Locale identifiers …

Back to Locales… IUC 20 Round Table Suzanne Toppings Multilingual Article Tex Texin and the Locales list…

Locale Identifiers and Web Services

W3C and Unicode W3C – Identifiers and cross-over with language tags – Web services – XML, HTML Unicode Consortium – LDML – CLDR – Standards for content

Language Tags and Locale Identifiers SPEC First Working Draft coming soon – URIs? – Simple tags?

WS-I18N SPEC First Working Draft now available: –

Ideas?