Transforming Parallel Corpora to Translation Memory Steve Legrand IPN 29th Sept. 2006.

Slides:



Advertisements
Similar presentations
How to Use a Translation Memory Prof. Reima Al-Jarf King Saud University, Riyadh, Saudi Arabia Homepage:
Advertisements

Alternative FILE formats
Chapter 6 Photoshop and ImageReady: Part II The Web Warrior Guide to Web Design Technologies.
Creating a Program In today’s lesson we will look at: what programming is different types of programs how we create a program installing an IDE to get.
MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,
A New Learning Tools. Topic Maps is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information.
Lecture 2: Do you speak Java?. From Problem to Program Last Lecture we looked at modeling with objects! Steps to solving a business problem –Investigate.
Lecture 2: Do you speak Java?. From Problem to Program Last Lecture we looked at modeling with objects! Steps to solving a business problem –Investigate.
Introduction to Java.
CS0007: Introduction to Computer Programming Setting Up Java.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
Optimizing Focus on process improvement Quantitatively Managed Process is measured and controlled Defined Process for the Organization, Proactive Managed.
Digital/physical content store. Summary Create a digital content/physical product web store based on osCommerce. Following items can be sold in the store:
Publications, design sets, web pages
© Cheltenham Computer Training 2002 Microsoft Publisher 2002 – Slide No 1 Microsoft Publisher 2002 Intermediate Level Course.
Overview of the current architecture TRADUCTIONS Client system Codes in XML Translation in XML XML Import XML Export Domain Tables Codes Translations Graphical.
 Trends: › usual trio: desktop version, server version, cloud version › cloud version + free editor › industry standards adopted (XLIFF, TMX, TBX)
Simple Pages for Omeka Lauren Dzura LIS
Basic HTML Hyper text markup Language. Re-cap  … - The tag tells the browser that this is an HTML document The html element is the outermost element.
FLAVIUS Technical presentation (Overblog, Qype, TVTrip) - WP2 Platform architecture.
By Anthony W. Hill & Course Technology1 Common End User Problems.
Enabling the ARM Learning in INDIA ARM DEVELOPMENT TOOL SETUP.
Slide 1/8Jack IDE Tutorial, Index This program is part of the software suite that accompanies the book The Elements of Computing.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
Internationalisation and Localisation Agenda Overview Configuration Language Packages Dictionary Files Default Tool Content Right to Left Support Translation.
Session IV - Use of administrative data for data collection - Statistics Belgium Geneva, 31 October – 2 November.
Steve Dower Software Engineer Python Tools for Visual Studio.
practical aspects1 Translation Tools Translation Memory Systems Text Concordance Tools Useful Websites.
Putting Applets into Web Pages.  Two things are involved in the process of putting applets onto web pages ◦ The.class files of the applet ◦ The html.
Translation Technologies Računalne tehnologije za prevo đ enje dr. Špela Vintar Department of Translation Studies Faculty of Arts University of Ljubljana.
SDL Trados Studio Translating Different File Formats Creating Projects.
OPML
Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
Team C November 30, 2012 Major Document 5.  Create a document in a Microsoft Office.  You can create a document in either Microsoft Word, Microsoft.
(C) 2014 Logrus International Visualizing ITS 2.0 Categories for the localization process.
Introduction to Eclipse. What is Eclipse? An Integrated Development Environment Provides many features to ease C++ programming (and others, e.g. C/Java)
Open Source CAT Tool Patrícia Azeredo Ivone Ferreira IT for Translation 2009/2010.
LBSC 690 Session 5A Programming. Languages How do we learn a language? Learn by listening Then reading Then writing How do we teach programming? Learn.
Web Page Design Introduction. The ________________ is a large collection of pages stored on computers, or ______________ around the world. Hypertext ________.
Xml:tm XML Text Memory Using XML technology to reduce the cost of translating XML documents.
1 Machine Assisted Human Translation (MAHT) (…aka “Translation Memory” or “CAT tool”) …and what it does for the translator…
SDL Trados Studio 2014 Getting Started. Components of a CAT Tool Translation Memory Terminology Management Alignment – transforming previously translated.
SDL Trados Studio 2014 Creating and Managing TMs Alignment Reviewing translations.
Chapter 8 HTML Editors. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 8-2 Text Editors Text editors don't have word processing features.
LBSC 690 Session 4 Programming. Languages How do we learn a language? Learn by listening Then reading Then writing How do we teach programming? Learn.
Evaluating Translation Memory Software Francie Gow MA Translation, University of Ottawa Translator, Translation Bureau, Government of Canada
Metatexis “the easy way to translate” By: Diana Delgado Ma. Victoria Porro Master en Traduction – TAO ETI – automne 2009.
Representation of Data - Instructions Start of the lesson: Open this PowerPoint from the A451 page – Representation of Data/ Instructions How confident.
MICROSOFT MICROSOFT DEVELOPING ASP.NET MVC 4 WEB APPLICATIONS Study Guide.
TRADOS There are two components in using the TRADOS tool for translation. a. Translator’s Workbench – the TM b. TagEditor – the program for opening the.
Introduction to Algorithm. What is Algorithm? an algorithm is any well-defined computational procedure that takes some value, or set of values, as input.
Application Sharing Bhavesh Amin Casey Miller Casey Miller Ajay Patel Ajay Patel Bhavesh Thakker Bhavesh Thakker.
Exam : Identity with Windows Server 2016
Open Source CAT Tool.
Master of Translation An introduction to post-editing
Jekyll Documentation Theme
Corpus Linguistics I ENG 617
Data Exchange.
Microsoft Word Illustrated
3D Modelling with Tinkercad
RDA and translations Gordon Dunsire, Chair, RSC
Part of the Multilingual Web-LT Program
DITA Translation Management Challenges in Japan
The new Eurostat publications program
Topics in Linguistics ENG 331
TMX 2012 LAB DEMO LAB 2 [WEB PUBLISHING] by: Ahmad Hafiz
Computers Tools for an Information Age
Donald Donais Minnesota SharePoint Users Group – April 2019
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 4 prof. ssa Laura Liucci –
Presentation transcript:

Transforming Parallel Corpora to Translation Memory Steve Legrand IPN 29th Sept. 2006

Parallel text or bitext Aligned translation of text from one language to another. Practical uses in NLP: - Word sense disambiguation - Automatic translation - Translation memories

Translation Memory Helps the translator by using already translated text segments to cue in the translation of new text segments Translation memory correspondence level can usually be set (e.g., 56%) Automatic translation can be combined with translation memories  post-editing of automatic translation for translation memory uses.

Translation memory format (.tmx).tmx (translation memory exchange) is a standardized format for application interoperability. tu: translation unit, unit father of every element to be translated. It can contain a unique identifier (tuid). tuv: translation unit variant, unit that contains the language code of the translation (xml:lang). seg: segment, it contains the translated text.

TMX Example

Poor man’s guide to translation memories Trados the best known and probably one of the best commercial TM applications available. There are cheaper one-user versions, but in spite of that the price is often prohibitive. To avoid excessive costs, one could: – Use a demo versions of the commercial software – Use Open Source products.

OmegaT Open Source translation memory Needs Java Run-time Needs Open Office to convert.doc format to.odt or.swx- format (open standard) Creates tmx.files Tmx-files can also be exported from other applications

Parallel corpora  tmx To be able to use a parallel corpora as a translation memory we need first to convert it to the tmx format. We can either use a existing parallel corpora or create our own. There are many open source web resources for creating our own parallel corpora

Using open parallel corpora resources – English source Jack London published about 40 books in English. Almost all his English- language works are publicly available at – Project Gutenberg in:

Using open parallel corpora resources – Spanish source (s) Among the many sources of Spanish translations of Jack London’s books there is: obal/literatura/

Aligning parallel texts For example: Download “White Fang” by Jack London from Project Gutenberg and its translation “Colmillo Blanco” from rincondelvago Use bitext2tmx (free open source application) for alignment

bitext2tmx aligner: configuration

bitext2tmx aligner: text alignment

Bitext2tmx producing a tmx-file

The tmx-file produced by bitext2tmx can be added to OmegaT’s tm directory to be used as part of the translation memory

Other tools with Omegat.tmx-files can be cleaned with tmxcleaner.tmx-files can be merged with tmxmerger.tmx-files can be validated with tmxvalidator – (can be downloaded from the OmegaT site It is important at least to validate the files before adding them to OmegaT’s translation memory.

Current work : Using these Open Source resources, translating a book from English to Spanish with the students of applied linguistics at Colima University with IPN backing. Ready by the middle of November. Linguistica Computacional

Save your money. Use Open Source!