TMX and SRX Exchanging TM Data LRC-X Conference 13-14 September 2005 Angelika Zerfass Consultant and Trainer for Translation Tools.

Slides:



Advertisements
Similar presentations
Kick-off Meeting PlanCoast Ancona 13./ The Baltic Sea: Spatial Planning in ICZM and Sea Use Planning by Bernhard Heinrichs Ministry for Labour,
Advertisements

LCS Server Programmability John Lamb Consultant Microsoft UK.
Slide 1 Insert your own content. Slide 2 Insert your own content.
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Can I Use It, and If so, How? Christian Lieske SAP AG – MultiLingual Technology Discussion of Consortium Proposal for OLIF2 File Header.
© Bowne Global Solutions, Inc All rights reserved Bowne Global Solutions and OLIF Industry Implementation Michael Kranawetvogl Linguistic Engineering Bowne.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
September, 2005What IHE Delivers 1 CDA-based content integration profiles Philippe Lagouarde, Cegedim Co-chair Vendor, IHE-France.
13 September 2012 SDMX Technical Working Group1 Report of the SDMX Technical Standards Working Group SDMX Expert Group Meeting, Paris, September 2012.
Slide 1 FastFacts Feature Presentation September 26, 2011 We are using audio during this session, so please dial in to our conference line… Phone number:
Florence Meeting, September 2001 Regnet CULTURAL HERITAGE IN REGIONAL NETWORKS Interim Report 6.1 – Information Dissemination D 13 – Information.
Combining Like Terms. Only combine terms that are exactly the same!! Whats the same mean? –If numbers have a variable, then you can combine only ones.
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
The European Qualifications Framework (EQF)
Cooperation Fund 8 th Liaison meeting on Trade Marks Alicante, 10 – 11 October 2012 October 2012.
1. World Alliance Area Alliances National Movements Regional Organisations Local Associations 2.
Gadsden City High School Construction Progress. Gadsden City High School March 30, 2005.
0 - 0.
1 AirBase DEM training Wim Mol, Libor Cernikovski, Patrick van Hooydonk European Topic Centre on Air and Climate Change (ETC/ACC)
DIGITAL ACCOUNTING RESEARCH CONFERENCE (2005) The Application of Electronic Forms in the Financial Work Flow.
Course Goal To improve the TB contact investigation interviewing skills of health care workers 2.
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Using Open Standards: Save Money and Meet Customer Needs Using Open Standards: Save Money and Meet Customer Needs John Watkins, President, ENLASO
Agenda –Dag Haugland, Frequency allocation problems and the use of combinatorial optimization methods. –Fedor Fomin, Exact algorithms –Fredrik Manne,
The PPSO SIG Autumn 2005 Conference…. AUDITING WHAT YOUR PPSO PROVIDES David Marsh and John Zachar.
?????????? ? ? ? ? ? ? ? ? ? ????????? ????????? ????????? ????????? ????????? ????????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ?? ?? TM.
CoRe 2 Final meeting Introduction agenda Jenneke Lokhoff 12 & 13 August.
Addition 1’s to 20.
Unit 14 Assessment Objective Three. Unit 14 Assessment Objective Three.
What is Center for Persona Research and -Application? Center for Persona Research and –Application is an independent research and consultancy center at.
Alignment and Maturity are Siblings in Architecture Assessment
Automating Globlisation LRC XI, 26 th October 2006.
Just to get it right...  We already have an ITS: the XML ITS  We will discuss another ITS: the RIM ITS.
Automation of Terminology tasks using T-Manager Rafael Guzmán Symantec LRC Conference, 2 October 2008.
ANSI TAG 37 Committee F43 Language Services and Products Interagency Language Roundtable September 30, 2011 Sue Ellen Wright ISO TC 37, Terminology and.
Setup of term lists and term bases for terminology checking LRC XI, 25./26. October 2006 Angelika Zerfass Consultant and Trainer for Translation Tools.
IBM Corporation business on demand & the Localization Industry Dr Brian O’Donovan, IBM Dublin Software Lab LISA Conference, London, July 2003.
1 Day 4. 2 review of day 3 & feedback from evaluation.
Transforming XML Schema to Conceptual XML Reema Al-Kamha Spring Research Conference Supported by NSF.
EUROPE. September 08 to 11 September 12 to 14 September - 15.
XML Browser 닉스테크 교육사업부 김찬
European Commission The European Charter for Small Enterprises Edward TERSMETTE Enterprise and Industry Directorate General.
Intro to Internationalization and Localization Localization World Conference 2010, Seattle Angelika Zerfass Adam Asnes
Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.
Tuesday, November 12, 2002 LRC 2002 Conference XLIFF An XML standard for localisation Tony Jewtushenko – Oracle Peter Reynolds – Bowne Global Solutions.
Creating the Connected Campus OUCC Conference June 3 rd, 2015.
Open Source CAT Tool Patrícia Azeredo Ivone Ferreira IT for Translation 2009/2010.
Transforming Parallel Corpora to Translation Memory Steve Legrand IPN 29th Sept
Xml:tm XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
IGNITE – Linguistic Infrastructure for Localisation: Language Data, Tools and Standards 1 IGNITE A proposal to the European Commission eContent Programme.
SDL Trados Studio 2014 Getting Started. Components of a CAT Tool Translation Memory Terminology Management Alignment – transforming previously translated.
Diagnostic Health Check & Utility Tools Andile Makhanda October 2013.
Standards that might come up in discussion today EN 15038: quality standard developed especially for translation services providers, including regular.
2010 Practice Management Annual Conference Essentials of Report Design in Juris Suite Presented by: Celeste Bradford Juris ®
GLOBAL ANSWERS CLP – A Trainer’s perspective Siobhan King-Hughes 21 Sept 2004.
Open Source CAT Tool.
زبان بدن Body Language.
Cool Careers in Localization
DITA Translation Management Challenges in Japan
Establishment of a Water Sub-Committee at the NLA-SA
Title of Poster Author box centered on poster Author bold centered
Title of Poster Author box centered on poster Author bold centered
2018 Asset Management Conference
SQL Server 2008 for developers
Presenter #1 • Presenter #2 • Presenter #3 Presenter #4 • Presenter #5
“In Paint” Communication Kick-off meeting Patrizia Di Mauro
Insert Author Names; Author Names
WFD & Agriculture – UK Conference
Presentation transcript:

TMX and SRX Exchanging TM Data LRC-X Conference September 2005 Angelika Zerfass Consultant and Trainer for Translation Tools

What is TMX –TMX levels –TMX examples After the data exchange… What is SRX Agenda

What is TMX It is an XML representation of translation memory data –Header –Body <header creationtool=“Déjà Vu " creationtoolversion=“4" datatype="PlainText” segtype="sentence" adminlang="en-us" srclang="en-us" o-tmf="DVMDB" > Déjà Vu, Transit, Trados, SDLX Version / build number of the tool HTML, SGML, RTF, Interleaf, Java… Basic segmentation Default language for elements like Source text language Original translation memory format (DVMDB – Déjà Vu database…)

What is TMX –Body This is the first sentence. Dies ist der erste Satz tu = Translation Unit, tuv lang = translation unit variant (language), seg = segment

What is TMX Depending on the tool that created the TMX file, it can be bilingual or multilingual. Importing multilingual TMX file into a bilingual project will only import the relevant languages

Levels of TMX Level 1: Plain text only (sufficient for data coming from software localization tools) Level 2: Text plus formatting (data coming from translation memory tools used for translation of documentation) To move formatting and text from one tool to the other both tools need to be level 2 compliant!

Level 1 Formatting that is applied to the source and target text of a translation unit is not exported to the TMX file, only pure text. Original –This sentence has some formatting. In TMX –This sentence has some formatting.

Level 2 Formatting that is applied to the source and target text of a translation unit is exported to the TMX file. Different tools use different ways of encoding that information.

TMX from Déjà Vu (Atril) Original –This sentence contains different formatting information. In TMX from Déjà Vu {1} This {2} sentence {3} contains {4} different {5} {6} formatting information {7}. DV puts placeholders (ph) where the formatting will go, not the formatting information itself, formatting information is stored in a separate file.

TMX from Translator’s Workbench (Trados) Original –This sentence contains different formatting information. In TMX from Translator’s Workbench This {\b /ut>sentence } contains {\i different } {\ul formatting information }. This {\b sentence } contains {\i different } {\ul formatting information }. –Example 1 is from Version 6.5, example 2 from version 7

TMX from Transit (Star) Original –This sentence contains different formatting information. In TMX from Transit This <b> sentence </b> contains <i> different </i> <u options="single"> formatting information </u>. Transit uses the begin paired tag (bpt) the end paired tag (ept) and the information for bold (b), italics (i) and underlined (u)

TMX from SDLX (SDL) Original –This sentence contains different formatting information. In TMX from SDLX This <1> sentence </1> contains <2> different </2> <3> formatting information </3>. SDLX uses placeholders for formatting information that is stored in a different file

Implications of different tags for formatting Tools that use placeholder tags do not include the actual formatting information in the TMX file –Other tools can only re-use the text –The result of the exchange is the same as with TMX level 1 (text only) TMX files which carry the actual formatting information will yield better matches in other tools that can read this information.

TMX specification TMX is a recommendation by OSCAR –OSCAR: LISA special interest group Open Standards for Container/Content Allowing Re-use –The latest specification can be downloaded from –For comments: –List of TMX certified tools The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process.

Does it work? With the current versions of translation tools on the market it works quite well –Previous versions sometimes created their own “flavor” of TMX which could not readily be imported by other tools, but the export files had to be changed before import. (en-us, EN_US) Yes, it does what it was developed for, it makes the exchange of data between tools possible… BUT - This is only half of the story… The question is, how well can the data that has been exchanged be used…

Reusing TMX data Although Translation Memory Tools have the same basic idea (storing source-target language pairs and recycling translations), this has been realized in different ways. Main issue here, are the segmentation rules

Segmentation rules Rules that the tool applies to the text to translate to split it up into segments –paragraph –sentence –phrase –incomplete sentences in bulleted lists –single words (headings, “Note”, “Attention”)

Segmentation rules End of segment rules (common to the default settings of all tools) –Dot at the end of a sentence (not after known abbreviations) –Question mark, exclamation mark –Paragraph mark –Colon End of segment rules (different for different tools) –Semicolon –Tab character –Sub segments (index entries, footnotes, graphics)

Comparison of default rules WorkbenchTransitDVSDLXAcross Colon end no end Semi- colon no endend no end Tab endno end Soft return no end end in Word no end in PPT no end

Example: semicolon Tool A –Semicolon is end of segment This is a sentence; this is another sentence. –TM system sees two separate segments Tool B –Semicolon is NOT end of segment This is a sentence; this is another sentence. –TM system sees one segment –No match from the TMX data! Match rate around 50%, usual setting around 70%

Settings for better reuse… Check the segmentation settings of the source tool, if possible Re-create this setting in the target tool, as far as possible Set down the minimum match value from the default 75% to about 50% For TM data that does not yield useful results, you may have to run an alignment of the original material on the target system.

Next step - SRX Segmentation Rules Exchange When exporting TM data to a TMX file, the segmentation rules are written to an extra file. If the receiving system is able to create the same setting as the TMX-exporting system, the recycling rates for matches will get better.

SRX SRX is under developed at the moment. The SRX file will contain the following information: – - Definition of the rules of a specific language – - Definition, how those rules were set at the time of the TMX export

Endrules and exceptions Rule: –A dot followed by a space is the end of a segment.. This is the first sentence. This is the second sentence. Exception: –A dot, preceded by a number is not the end of a segment. Dies ist der 1. Satz.

End rules and exceptions Rule: … [\.] \s … Exception for numbers, abbreviations... … [0-9]+\. \s …

What can SRX not do? It can only show the segmentation rule settings at the time of export. It cannot show any changes that have been applied in the segmentation rules during the use of the TM. Sometimes the rules from system 1 cannot be re-created in system 2, then the rule will be ignored.

SRX Specification Latest version – –

Next level Up to now the tools can only exchange information on text and its formatting. SRX will come soon. Next level after that would be the exchange of additional data like project name, customer name…but as the tools differ very much in what they offer this will be difficult –Some tools offer free creation of fieldnames others only offer a certain set of fields

TMX discussion lists – –For TMX developers, founded July 2003, less than 5 members, seems to have very low traffic – –founded November 2000, 190 members –Localization Clients and Vendors looking at standards together so that we can standardize on a Translation Object. –Examining OPENTAG, TMX and other XML standards. – –Translation Memory Exchange Standards Mailing List Mailing list to discuss TMX and other related standards. Said to have very low traffic. –http // –TMX implementation mailing list

Thank you for your attention Any Questions? Angelika Zerfass