Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Slides:



Advertisements
Similar presentations
Autodesk Integrations Overview SmartDesk A seamlessly integrated, affordable, out-of-the-box, Windows based drawing and document management tool for.
Advertisements

Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Can I Use It, and If so, How? Christian Lieske SAP AG – MultiLingual Technology Discussion of Consortium Proposal for OLIF2 File Header.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Catalyst Preview Enda McDonnell Alchemy User Conference London 2012 London Science Museum 31 May 2012.
Configuration management
Configuration management
Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
Multi-Mode Survey Management An Approach to Addressing its Challenges
Translation Made Easy STAR Group Top 10 Lessons for Translators January 20 th 2006.
HTML5 and CSS3 Illustrated Unit B: Getting Started with HTML
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Roadmap to Continuous Integration Testing and Benefits Gowri Selka, Walgreens Natalie Koltun, Walgreens May 20th, 2014 ©2013 Walgreen Co. All rights reserved.
L10N Standards Warszawa 2014
THE TRANSLATION NETWORK Overview  Easily manage your multilingual sites  Synchronize content and manage changes  Translate content on the fly  Use.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Information Retrieval in Practice
Requirements Specification
“DOK 322 DBMS” Y.T. Database Design Hacettepe University Department of Information Management DOK 322: Database Management Systems.
8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.
Configuration Management
Overview of Search Engines
 ACORD ACORD’s Experiences using W3C Schemas Dan Vint Senior Architect
Firat Batmaz, Chris Hinde Computer Science Loughborough University A Diagram Drawing Tool For Semi–Automatic Assessment Of Conceptual Database Diagrams.
1 Data Strategy Overview Keith Wilson Session 15.
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
15 November 2005Linking Outside the Box1 Cross referencing between XML documents Bob Stayton Sagehill Enterprises
Lecture 15 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Report Prepared for Envision Presented by: Kristen Vargas Rossana Figuera Yinka Osidein.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Interfacing Registry Systems December 2000.
Internationalization: Implementing the XLIFF Standard Jon Allen, Producer instructional media + magic, inc. JA-SIG Summer Conference 2003 June 10, 2003.
TRANSLATION MEMORY TECHNOLOGY
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Translation Memory System (TMS)1 Translation Memory Systems Presentation by1 Melina Takanen & Julianna Ekert CAT Prof. Thorsten Trippel University.
PASSOLO ® Makes Your Software Ready for the Global Market Localisation Standards The Tools Developer’s Perspective.
Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Introduction to the Semantic Web and Linked Data
26 th Annual MIS Conference – February 13, Why CEDS?
7 Strategies for Extracting, Transforming, and Loading.
Development of an Intelligent Translation Memory MorphoLogic SZAK Publishers Balázs Kis
Standards that might come up in discussion today EN 15038: quality standard developed especially for translation services providers, including regular.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
Module Road Map Assignment Road Map Notice we have linked the conduit directly to the presentation layer. This is normally a bad idea!
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Chapter – 8 Software Tools.
HTML5 and CSS3 Illustrated Unit B: Getting Started with HTML.
Introduction to Visual Basic 2008 Programming
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Modern Systems Analysis and Design Third Edition
Introducing the technology
Open Source CAT Tool.
Database Processing with XML
CSCI/CMPE 3334 Systems Programming
Translation Workspace File Filters
Part of the Multilingual Web-LT Program
DITA Translation Management Challenges in Japan
Localization Summit 1.
Database Design Hacettepe University
Chapter 3 Database Management
Use Cases Simple Machine Translation (using Rainbow)
Modern Systems Analysis and Design Third Edition
User’s Perspective Laurie Gerber.
HTML5 and CSS3 Illustrated Unit B: Getting Started with HTML
Presentation transcript:

Coping with Babel How to Localize XML

Designing for Localization Document design can seriously impact the costs of translation and localization. Remember that you are designing for all languages, not just English. There are clear do’s and don’ts. Overriding principle is good XML practice. Always consider the target language implications.

Entity references Do not use entity references for word substitution: Use a &tool; to release the catch. Cause problems for inflected languages Cause problems for parsing/translation tools Use boiler plate text instead

Translatable attributes Avoid using translatable attributes: Use a to release the CPU retention catch. Cause problems for inflected languages Cause extra burden for translators More to go wrong

CDATA sections Avoid using CDATA sections that may contain translatable text: Please refer to the index page page for further information ]]> Lose syntactical control How are translation tools to cope?

Processing instructions Avoid Processing Instructions in translatable text: Use a to release the CPU retention catch. Syntactically week Confuse translation memory operations

Infinite Naming Schemes Avoid the use of infinite naming schemes: Cannot open file $1. Hint: does file $1 exist. Incorrect value. Hint: Must be between $1 and 2. Connection timeout. No clear element definitions

Typographical elements Avoid the use of "typographical" elements: Do not use type elements. Bad XML practice. Causes problems for translators. Target language text may be in the opposite order.

Do not break sentences Never break a linguistically complete text unit over more than one non-inline element: This text should not be broken this way – the translated text may well be in a different order.

XML Translation Standards LISA - Localization Industry Standards Association: OASIS - Organization for the Advancement of Structured Information Standards: W3C - World Wide Web Consortium: OLIF Consortium:

LISA Standards TMX - Translation Memory Exchange format: TBX - Termbase Exchange format: SRX - Segmentation Rules Exchange format: GMX - GILT Metrics Exchange format:

OASIS L10n Standards XLIFF - XML Localization Interchange File Format: open.org/committees/tc_home.php?wg_abb rev=xliff TransWS - Translation Web Services: open.org/committees/tc_home.php?wg_abb rev=trans-ws

W3C and OLIF W3C to start on Localization Directives standard. OLIF - Open Lexicon Interchange Format:

xml:tm XML Text Memory A radical new approach to translating XML documents

Machine Translation Translation Memory Hybrid Linguistic Inferencing Engines Terminology Computational Linguistic Methodologies

Translation memory Advent in early 1980’s Intermediate format Alignment Storage Leveraged memory Fuzzy matching – statistical Advantages: cost reduction, consistency Drawbacks: proofreading, managing memories No significant advances in technology

XML namespace Major new feature of XML compared to SGML Allows the mapping of different ontological entities onto the same representation Allows different ways to look at the same data Namespaces can be made transparent

xml:tm namespace Text Memory namespace Can be mapped onto any XML document Vertical view of document in terms of ‘text segments’ Can be totally transparent xml:tm

xml:tm namespace xml:tm Example of the use of namespace in an XML document: Namespace is very flexible. It is very easy to use.

xml:tm namespace doc title section para text tm te sentence tu te sentence tu te sentence tu tm namespace view original document view te text tu text te sentence tu para text para text para text para text para text te sentence tu te sentence tu

xml:tm namespace text te sentence tu original document view tm namespace view

xml:tm namespace Namespace is very simple. It is easy to use. te sentence tu original document view tm namespace view Namespace is very simple. It is easy to use. text

xml:tm Text Memory Author memory Maintain memory of source text Authoring statistics Authoring tool input Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs xml:tm

Updated Source Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” new Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” xml:tm DOM differencing origid=”5” modified

xml:tm Author Memory Namespace aware differencing Identify changes from the previous version Unique text unit identifiers are maintained Modification history Text units can be loaded into a database Authoring environment integration xml:tm

xml:tm Translation Memory The tm namespace can be used to create XLIFF files Automatic alignment of source and target languages Allows for more focused translation matching –Perfect matching –Leveraged matching from document - identical text –Leveraged matching from database –Modified text unit matching –Linguistically enhanced fuzzy matching –Non translatable text unit identification xml:tm

xml:tm translation Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” XLIFF Document trans-unit id=”1” trans-unit id=”2” trans-unit id=”3” trans-unit id=”4” trans-unit id=”5” trans-unit id=”6”

doc title section para tekst tm te zdanie tu te zdanie tu te zdanie tu translated tm namespace view translated document view te tekst tu tekst te zdanie tu para tekst para tekst para tekst para tekst para tekst te zdanie tu te zdanie tu xml:tm translated document

Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Perfect alignment xml:tm perfect alignment

Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” modified new Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation xml:tm perfect matching

xml:tm contextual memory Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Perfect alignment

Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Perfect alignment DB xml:tm leveraged DB memory

xml:tm in-document leveraged matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” modified new:same id=”3” Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing leveraged match

xml:tm in-document fuzzy matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” mod:origid=”5” New:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing fuzzy match leveraged match

Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” mod:origid=”5” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing fuzzy match doc leveraged match tu id=”9” xml:tm db leveraged matching DB requires proofing DB leveraged match

Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” non trans tu id=”8” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing fuzzy match doc leveraged match tu id=”9” DB requires proofing DB leveraged match tu id=”2” requires no translation non translatable xml:tm non translatable text

Traditional Translation Scenario xml:tm source text PublishingTranslation source text extract Extracted text tm process Prepared text Translate Translated text target text merge target text QA

xml:tm xml source text Publishing Translator extract Extracted text tm process Prepared text Translate xml target text merge Web perfect matching leveraged matching Automatic Process web interface QA Automatic Process xml:tm Translation Scenario

xml:tm matching Perfect Matching driven by Author Memory Leveraged Matching: 100% same text In document Leveraged Matching Database Leveraged Matching Fuzzy Matching Modified Matching Linguistically aware Fuzzy Matching Non translatable element identification Alphanumeric Numeric Measurements xml:tm

xml:tm benefits Enterprise level scalability Totally integrated within the XML framework Source text is automatically extracted and matched Word counts are controlled by the customer Text can be presented for translation via the web Online composition The most up to date translation is held by the customer Data is merged automatically at end of translation cycle All memory operations are totally automated Can be used transparently for relay translations Much cheaper to implement and run More accurate – better matching xml:tm

xml:tm summary Can be used to build consistent authoring systems Can be used to produce automatic authoring statistics Translation Memory generation and alignment is totally automatic Memory is held within the documents themselves Extraction and merging for translation are automatic The system provides much more efficient matching mechanisms Structure of the XML document is protected during translation xml:tm

Fully specified XML based standard xml-tm.html Maintained by xml-intl.com Detailed article on Offered for consideration as a Lisa standard xml:tm

Any questions? xml:tm