Presentation on theme: "Localization World Silicon Valley 11 October 2011 A Buyer’s Guide to the Localization Standards Landscape."— Presentation transcript:
Localization World Silicon Valley 11 October 2011 A Buyer’s Guide to the Localization Standards Landscape
Session Agenda Characterizing standards success Interoperability standards XLIFF update On the horizon
Experts David Filip, LRC/CNGL Patrick Guillemin, ETSI Arle Lommel, GALAJaap Van Der Meer, TAUS
Session goals Entice Interest in understanding Educate Awareness of issues and possibilities Encourage Insight into local applicability Engage Impact on global business performance
We will not... Pass judgment on what standards are good or bad. We suggest a free market approach. Preach about which standards are most important. Pain points vary widely. Argue about approaches Panelists are happy to pick up discussions for those who are interested. Attempt to sort out overlapping initiatives. See “free market approach” above.
Reasons for failure Too narrow, too obviously political, no value (doesn’t solve a real problem), not sustainable Many boutique efforts reach first flush. But companies move on in marketing focus, and funding gets cut. After initial success, no platform for broader dissemination. Serious efforts want the benefit of interrelated standards liaison (e.g., OASIS/ISO). Standards development requires both technical and marketing work, including education and the right IP approach for the target market. Adoption activities shoehorned into technical committee work.
DITA as a model: Success factors for standards adoption Widespread need: move to structured content management without burden Simple in theory, but allows for complexity in application and implementation Formed subcommittees to deal with application-level use cases: agile development (get the basics down, then iterate) Solve a common set of problems, but allow for extensions and specialization Serious, conscious adoption: market education, strong vendor support from the beginning
XLIFF 2.0 Program Charter, Process, & Timeline DF as the liaison officer on behalf of XLIFF TC
Terminology XLIFF – XML Localization Interchange File Format OAXAL – Open Architecture for XML Authoring and Localization (Reference Model) (TC) MLW-LT – MultilingualWeb – Language Technology W3C – World Wide Web consortium OASIS – Organization for the Advancement of Structured Information Standards ULI – Unicode Localization Interoperability (TC)
OASIS Charter – Clarification or Re-Chartering? Core characteristics and goals of XLIFF standardization. Potential to develop the pivotal standard for Language Technology, Localization and Internationalization. What role should the XLIFF standard play in the overall Language Technology, Localization, and Internationalization standards architecture? The current statement of Purpose may need to be clarified/changed (extended in scope). The current aim is interchange, but the standard can naturally expand to covering storage, legacy content leveraging, annotations and tagging etc. Core vs. Module Criteria for elements being in core or not Criteria for modules being developed Prioritization and timeline for XLIFF 2.0 and XLIFFs 2.x throughout 2011 and 2012 Role of Customers, i.e. Toolmakers and Enterprise Users vs. End Users Membership Section and its funding Funding of Open Source Reference Implementations (Open Toolkit – OKAPI, M4Loc, etc.) Reviewing Toolmakers’ extensions as source of industry wisdom? XLIFF TC policy towards ULI, W3C ITS, ISO TC 37, ETSI ISG LIS Standardization that is needed in Language Technology (LT), Localization (L10n) and Internationalization (I18n). Overall architecture of localization process infrastructure standardization
General Options EITHER Breadth OR Depth EITHER Normative Processing Requirements OR Informal Recommendations EITHER Publish minimal core quickly OR try to address long tail of feature requests EITHER improved functionality OR backwards compatibility Extensibility?
Description of Business Needs the Program should address Customers’ voice: The 1.x standard is too complex The 1.x standard has too generous extensibility The 1.x standard lacks explicit conformance criteria The overall goal is to ensure interoperability throughout Language Technology related content transformations during the whole content lifecycle. Although the XLIFF 1.x standard was intended primarily as an exchange format the industry practice shows that the defined format is also suitable for storage and legacy content leverage purposes.
Description of the desired state The XLIFF TC commits to addressing the customer needs as under the Description of Business Needs. In particular XLIFF TC resolved via previous ballots to create a 2.0 standard that will -Be modular -Contain non-negotiable core -Be created with conformance and processing requirements in mind -Will allow for extensibility at predefined points. Extensibility will be allowed only for functionality that cannot be achieved through core or module. Although backwards compatibility with the 1.x standards is perceived as a value per se by the XLIFF TC and its customers, backwards compatibility has lesser priority than serving the business needs stated above. XLIFF TC will prioritize the non-negotiable core and its release over the long tail wish list.
XLIFF 2.0 SWOT Analysis Persistent Strengths Being well addressed by influx of new manpower. Toolmakers want to participate. Good progress on collection of implementers' extension points, semantics etc. In 2011 the TC should finish the initial requirements gathering and features definitions. Q12012 should see the new committee draft and Q2 the 2.0 standard
Conformance Clause An opportunity Make processing requirements integral part of the spec as normative, obligatory part of each element (including attributes) spec Strict Process for Feature inclusion in 2.x
Conformance Clause Strict Process for Feature inclusion in 2.x Owners must demonstrate to the TC not only the technical appropriateness of the feature but also explain what resources and timeframe is needed for elaboration and if those resources are available.
Core vs. Modules Core – Basic part of the specification that contains all and only substantial elements that cannot possibly be excluded without negatively affecting the standard’s capability to allow for basic language technology related transformations. [ongoing discussion on this concept, DavidF will work on deriving this concept from main success scenario rather than the vague notion of a basic LT transformation]
Core vs. Modules ctd. Meaningful functional whole – elements that are critical for performing certain types of language technology transformations, all and only such elements and their respective processing rules. Module – a part of the specification that fulfills all of the following conditions Does not overlap with Core Is compatible with Core Comprises all elements and their processing rules that form a meaningful functional whole
XLIFF Promotion and Liaison SC Bilateral relationships and liaisons Formal liaisons: ULI, ETSI ISG LIS (in progress), MLW-LT (W3C WG in creation) Watching: IN!, Linport, OAXAL, GALA, TAUS XLIFF Symposium (1 st Limerick 2010, 2 nd Warsaw 2011) OASIS organizational ballots State of the art research Etc.
XLIFF 2.0 momentum 15 Voting Members! And counting.. Heavy Hitters: Yves Savourel (ENLASO), Rodolfo Raya (Maxprograms), Bryan Schnabel Traditional contributors: SAP, SDL, LRC, PSBT New Entrants: GALA, Multicorpora, Tom Commerford Rejoined TC recently: IBM, LIOX On their way: Oracle, Kilgray, Welocalize, TAUS Interested: Atril, Microsoft, Wordbee
How to Influence XLIFF? XLIFF is an open standard: TRANSPARENT AND RF Archives publicly accessible http://lists.oasis-open.org/archives/xliff/ http://markmail.org/search/?q=list%3A%20xliff http://lists.oasis-open.org/archives/xliff/ http://markmail.org/search/?q=list%3A%20xliff Any one can subscribe for comment list http://www.oasis- open.org/committees/comments/index.php?wg_abbrev=xliff http://www.oasis- open.org/committees/comments/index.php?wg_abbrev=xliff Feature Tracking publicly viewable: http://wiki.oasis- open.org/xliff/XLIFF2.0/FeatureTracking#XLIFF2.0.2BAC8- Feature.2BAC8- ChangeTracking.ChangeTracking.2BAC8VersionControlhttp://wiki.oasis- open.org/xliff/XLIFF2.0/FeatureTracking#XLIFF2.0.2BAC8- Feature.2BAC8- ChangeTracking.ChangeTracking.2BAC8VersionControl
Key takeaways Not standards for the sake of standards... But what becomes possible with standardization. Compelling business cases are critical. Not just technology issues. Homework is essential. Where are your points of friction in the global content value chain, and what standards address your pain? Vote with your money!
MLW-LT Call For Participation David Filip Dave Lewis Felix Sasaki
Terminology CSA – Coordination and Support Action W3C – Worldwide Web Consortium WG – Working Group (in W3C) Deep Web, Surface Web LSP – Language Service Provider TM, MT, TMS CMS, CCMS OASIS DITA, XLIFF
Standardization focus - Metadata Multilingual Web must be aware of linguistic and localisation processing Process and Qulaity, Translatability, Legal, Terminology & Semantics.. Three main in scope scenarios Deep Web LSP Surface Web Real Time MT Deep Web MT Training All other scenarios are out of scope Reference implementations, XLIFF roundtrip prototypes, and test suits for all three
Deep Web LSP Deep Web is mostly XML and is being managed by CMS, ideally CCMS. Cocomore is involved in Drupal and Sharepoint based CMS and CCMS solutions Passing process, terminology, and translatability metadata from CCMS onto down stream localisation chain actors
Surface Web Real Time MT Ensure that relevant Deep Web metadata will resurface in the rendered HTML, so that real time MT services can make use of them to improve their output Again, translatability or terminology metadata will be passed onto MT to improve results
Deep Web MT Training Improve MT training through passing domain and processing related metadata This will allow for rapid creation of relevant training corpora, excluding ufront out-of-domain content, raw MT output etc.
Metadata "data categories" based on "W3C Internationalization Tag Set 1.0" relevant for the three scenarios: Translate, Localization Note, Terminology, Language Information Further data categories: Translation provenance, human post-editing, QA provenance, legal metadata, topic / domain information Everything is currently under consideration – your input counts!
Approach and Methodology Open Standard within W3C Internationalization Activity: – Transparent and Royalty Free Normative Processing Requirements Based on in scope process models Methodology how to expand to Create conformant extensions Enable future development Robust roundtrip implementations and test suits – bias for open source Close collaboration with OASIS XLIFF TC
Open Question(s) Breadth or Depth? Scope? Too broad? Too Narrow? Additions? Generalized Process Models as base for Normative Processing Requirements? Vs. Define only data categories and give non-normative advice on processing? More user scenarios? Missed a critical category?
IPR modesMailing listsFees Midsize LSP cost of membership (illustrative) propor tional vote Can non- memb ers vote? Individual or other low cost option? with TC voting rights? ETSI FRANDrestricted http://www. etsi.org/Web Site/member ship/fees.asp x €6,000 yes * NOfreeno OASIS RF on RAND public http://www. oasis- open.org/join /categories- dues $7500 (~€5300) NO $300/ $1200 yes W3C RFpublic http://www. w3.org/Cons ortium/fees €7,800NO freeyes Unicode RANDpublic http://www. unicode.org/ consortium/l evels.html $7500 (~€5300) yesNO$75no Comparison of Possible LT- Standardization homes * Does not apply for ISG LIS