Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005.

Slides:



Advertisements
Similar presentations
Autodesk Integrations Overview SmartDesk A seamlessly integrated, affordable, out-of-the-box, Windows based drawing and document management tool for.
Advertisements

Can I Use It, and If so, How? Christian Lieske SAP AG – MultiLingual Technology Discussion of Consortium Proposal for OLIF2 File Header.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Catalyst Preview Enda McDonnell Alchemy User Conference London 2012 London Science Museum 31 May 2012.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
How to Use a Translation Memory Prof. Reima Al-Jarf King Saud University, Riyadh, Saudi Arabia Homepage:
Translation Made Easy STAR Group Top 10 Lessons for Translators January 20 th 2006.
Translating Database Content with
L10N Standards Warszawa 2014
Information and Business Work
S1000D Transformation Toolkit Mr. Wayne Gafford Advanced Distributed Learning (ADL) Mr. Tyler Shumaker Concurrent Technologies Corporation (CTC)
Information Retrieval in Practice
Publishing Workflow for InDesign Import/Export of XML
1 COS 425: Database and Information Management Systems XML and information exchange.
Data Warehouse success depends on metadata
Working with Cascading Style Sheets. 2 Objectives Introducing Cascading Style Sheets Using Inline Styles Using Embedded Styles Using an External Style.
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Overview of Search Engines
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
IBM Corporate User Technologies | November 2004 | © 2004 IBM Corporation An Introduction to Darwin Information Typing Architecture: DITA Presented by Dave.
Gabriela Contreras, Continental Airlines Yvan Hennecart, SDL
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
XP Tutorial 7New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with Cascading Style Sheets Creating a Style for Online Scrapbooks.
15 November 2005Linking Outside the Box1 Cross referencing between XML documents Bob Stayton Sagehill Enterprises
National Institute of Standards and Technology 1 Testing and Validating OAGi NDRs Puja Goyal Salifou Sidi Presented to OAGi April 30 th, 2008.
FLAVIUS Technical presentation (Overblog, Qype, TVTrip) - WP2 Platform architecture.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Open Standards A winner or a loser? Terence Mac Goff, 3 rd June 2004.
MultilingualWeb – Language Technology A New W3C Working Group Felix Sasaki, David Filip, David Lewis.
Sofia Garcia/Roberto Silva Tutorial Workshop, GrenobleDate: 31/Jan/2007 The work of a professional translator and the translation agency V1.0.
Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
ORBIS & PORTALS E-Journal Workshop Michael Markwith, TDNet Inc. Reed College Library May 9, 2002.
ICOLC Las Vegas March 28, 2003 TDNet E-Management Services for Consortia From E-Journals to E-Resources Michael Markwith President, TDNet Inc.
Case Study Summary Link Translation entered a partner agreement with Autodesk to provide translation solutions integrating human and machine translation.
FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica.
DITA Single Source technology. What is Single Source? Single source technology is a concept of publishing documents when same content can be used in different.
PASSOLO ® Makes Your Software Ready for the Global Market Localisation Standards The Tools Developer’s Perspective.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
CNIT 132 – Week 4 Cascading Style Sheets. Introducing Cascading Style Sheets Style sheets are files or forms that describe the layout and appearance of.
©Silberschatz, Korth and Sudarshan10.1Database System Concepts W3C - The World Wide Web Consortium W3C - The World Wide Web Consortium.
Standards that might come up in discussion today EN 15038: quality standard developed especially for translation services providers, including regular.
Topic Maps for Cultural Heritage Collections Conal Tuohy Senior Developer New Zealand Electronic Text Centre
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
DITA: Not just for Tech Docs Ann Rockley The Rockley Group.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Of 24 lecture 11: ontology – mediation, merging & aligning.
How to use C OBI T implementation resources Brian Selby Director of C OBI T Initiatives ISACA.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
© 2005 KPIT Cummins Infosystems Limited We value our relationship XML Publisher Prafulla Kauthalkar RJTSB – Oracle Apps Consultant We value our relationship.
Information Retrieval in Practice
Working with Cascading Style Sheets
Introducing the technology
Open Source CAT Tool.
Software Documentation
Building the Localization Web
Translation Workspace File Filters
Automated MS Word and PowerPoint Translator
DITA Translation Management Challenges in Japan
Localization Summit 1.
Health On-Line Patient Education Web Site
The Translation Management System for Global Enterprises
Use Cases Simple Machine Translation (using Rainbow)
User’s Perspective Laurie Gerber.
DITA Overview – Build the case for DITA
Presentation transcript:

xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005

Machine Translation Translation Memory Hybrid Linguistic Inference Engines Terminology Automating Translation

Machine translation 40 year history Rigorous control of grammar and terminology can produce good results Lots of interesting new developments with hybrid statistical/transfer based systems Translation of free format text is theoretically impossible with current technology.

Translation Memory Align source and target text Look up new text against memory Relatively primitive technology Not much innovation over the past 30 years Need for proofing Proprietary translation memory formats

XML inherently easier to translate Separation of form and content Support for Unicode and other international encoding formats. Allows multiple output formats - PDF, XHTML, WAP Translating XML Documents

XML Translation Standards LISA - Localization Industry Standards Association: OASIS - Organization for the Advancement of Structured Information Standards: W3C - World Wide Web Consortium: OLIF Consortium:

LISA Standards TMX - Translation Memory Exchange format: TBX - Termbase Exchange format: SRX - Segmentation Rules Exchange format: GMX - GILT Metrics Exchange format:

OASIS L10N Standards XLIFF - XML Localization Interchange File Format: open.org/committees/tc_home.php?wg_abbrev= xliffhttp:// open.org/committees/tc_home.php?wg_abbrev= xliff TransWS - Translation Web Services: open.org/committees/tc_home.php?wg_abbrev= trans-ws open.org/committees/tc_home.php?wg_abbrev= trans-ws DITA – Darwin Information Technology Architecture open.org/committees/tc_home.php?wg_abbrev= ditahttp:// open.org/committees/tc_home.php?wg_abbrev= dita

W3C and OLIF W3C ITS OLIF - Open Lexicon Interchange Format:

XML namespace Major feature of XML Allows the mapping of different ontological entities onto the same representation Allows different ways to look at the same data Namespaces can be made transparent

xml:tm XML based text memory Revolutionary approach to translating XML documents First significant advance in translation memory technology Uses XML namespace to transparently embed contextual information

xml:tm namespace Text Memory namespace Can be mapped onto any XML document Vertical view of document in terms of ‘text segments’ Can be totally transparent

xml:tm namespace Example of the use of tm namespace in an XML document: Namespace is very flexible. It is very easy to use.

xml:tm namespace doc title section para tm te sentence tu te sentence tu te sentence tu tm namespace view original document view te text tu text te sentence tu para text para text para text para text para text te sentence tu te sentence tu text

xml:tm namespace Namespace is very simple. It is easy to use. te sentence tu original document view tm namespace view Namespace is very simple. It is easy to use. text

xml:tm Text Memory Author memory Maintain memory of source text Authoring statistics Authoring tool input Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs

Updated Source Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” new Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” xml:tm DOM differencing origid=”5” modified

xml:tm Author Memory Namespace aware DOM differencing Identify changes from the previous version Unique text unit identifiers are maintained Modification history Text units can be loaded into a database Authoring environment integration

xml:tm Translation Memory The tm namespace can be used to create XLIFF files Automatic alignment of source and target languages Allows for more focused translation matching –Exact matching –Leveraged matching from document - identical text –Leveraged matching from database –Modified text unit matching –Non translatable text unit identification

DITA Strengths Topic-centric level of granularity Very well thought out and flexible architecture for content creation and publishing Substantial reuse of existing assets Specialization at the topic and domain levels Automated processing based on meta data property Translate topic only once, reuse many times

DITA and xml:tm Both complement each other xml:tm encourages text reuse at the sentence level Automates translation matching and extraction Automatic alignment of source and target documents at the text unit (sentence) level Introduces the concept of exact matching for translation as well as focused matching Fully integrated with existing standards such as SRX, GMX, TMX and XLIFF

xml:tm translation via XLIFF Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” XLIFF Document trans-unit id=”1” trans-unit id=”2” trans-unit id=”3” trans-unit id=”4” trans-unit id=”5” trans-unit id=”6”

doc title section para tekst tm te zdanie tu te zdanie tu te zdanie tu translated tm namespace view translated document view te tekst tu tekst te zdanie tu para tekst para tekst para tekst para tekst para tekst te zdanie tu te zdanie tu xml:tm translated document

Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Exact alignment xml:tm perfect alignment

xml:tm perfect matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” modified new Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation

xml:tm leveraged DB memory Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Translated Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”5” tu id=”6” Perfect alignment DB TMX

xml:tm in-document leveraged matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” modified new:same id=”3” Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing leveraged match

xml:tm in-document fuzzy matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” mod:origid=”5” New:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing fuzzy match leveraged match

xml:tm db leveraged matching Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” d eleted tu id=”8” mod:origid=”5” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Perfect Matching requires translation requires proofing fuzzy match doc leveraged match tu id=”9” DB requires proofing DB leveraged match

Updated Source Document tu id=”1” tu id=”2” tu id=”3” tu id=”4” tu id=”7” tu id=”6” non trans tu id=”8” new:same Matched Target Document tu id=”1” tu id=”3” tu id=”4” tu id=”7” tu id=”6” tu id=”8” Exact Matching requires translation requires proofing fuzzy match doc leveraged match tu id=”9” DB requires proofing DB leveraged match tu id=”2” requires no translation non translatable xml:tm non-translatable text

Traditional Translation Scenario source text PublishingTranslation source text extract Extracted text tm process Prepared text Translate Translated text target text merge target text QA

xml:tm source text Publishing Translator extract Extracted text tm process XLIFF file Translate xml:tm target text merge Web perfect matching leveraged matching Automatic Process Web service/ interface QA Automatic Process xml:tm Translation Scenario

xml:tm benefits Open Standard donated by XML INTL to LISA Complements DITA Enterprise level scalability Totally integrated within the XML framework Source text is automatically extracted and matched Word counts are controlled by the customer Text can be presented for translation via the web Data is merged automatically at end of translation cycle All memory operations are totally automated Can be used transparently for relay translations More accurate – better matching

xml:tm Full specification: – intl.com/docs/specification/xml-tm.htmlhttp:// intl.com/docs/specification/xml-tm.html Maintained by xml-intl.com – – Detailed article on xml:tm in Donated by XML INTL to Lisa OSCAR

Any questions?

XML INTL Contact Details Postal address: PO Box 2167 Gerrards Cross Bucks SL9 8XF United Kingdom Phone: Fax: Bob Willans - Andrzej Zydroń – Bartek Bogacki –