Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.systransoft.com1 www.systransoft.com 1 TM Translating Subtitles using Machine Translation Practices, Problems, Methodology Elsa Sklavounou, Ph. D.

Similar presentations


Presentation on theme: "Www.systransoft.com1 www.systransoft.com 1 TM Translating Subtitles using Machine Translation Practices, Problems, Methodology Elsa Sklavounou, Ph. D."— Presentation transcript:

1 1 TM Translating Subtitles using Machine Translation Practices, Problems, Methodology Elsa Sklavounou, Ph. D. Linguist, Co-funded Projects Technical Coordinator SYSTRAN

2 2 TM SYSTRAN MT Customization Methodology Overview A customization project involves three different customization levels that provide incremental higher translation quality: Basic Terminology Complex Terminology Linguistic Rules

3 3 TM SYSTRAN MT Customization Methodology Overview Basic Terminology The first step entails the creation of a User Dictionary that covers most of the noun terminology in the corpus, and various simple adjective and verb terms. Complex Terminology The second level concerns the coding of complex terminological entries; such as the coding of complex verbs with their complements (subject, object…) and their translations. Linguistic Rules The third level involves language-specific code modifications in the SYSTRAN linguistic modules.

4 4 TM SYSTRAN MT Customization Methodology Level 1 & Level 2 Customization level 1 and 2 focuses on the implementation in the systems of specialized terminology from the corpus. Level 1 and 2 tasks include: Simple and complex terms extraction ; Simple and complex terms translations ; Simple and complex terms coding ; Simple and complex terms review ;

5 5 TM SYSTRAN MT Customization Methodology Level 1 & Level 2 Step 1: Corpus installation and analysis Prerequisite 1: a formatted corpus Step 2: Term extraction Simple terms (nouns and noun expressions) Complex terms (verb patterns) DNT (Do Not Translate) integration

6 6 TM SYSTRAN MT Customization Methodology Level 3 Customization level 3 focuses on the implementation of linguistic rules uniquely adapted to language-specific syntactic and semantic issues found in translations taken from the corpus. Level 3 tasks include: Detailed linguistic evaluations and the development of a comprehensive customization plan: Implementation of customized rules Regression tests Correction of linguistic translation errors Acceptance testing before release

7 7 TM SYSTRAN MT Customization Methodology Quality Levels Estimate of the quality levels that may be achieved for each customization level.

8 8 TM SYSTRAN MT Customization Methodology Software Tools The process for coding simple and complex terms and related dictionary maintenance is managed by the SYSTRAN Linguistics Platform that integrates the following two tools, required to complete customization levels 1 and 2.

9 9 TM SYSTRAN MT Customization Methodology Software Tools SYSTRAN Dictionary Manager The SYSTRAN Dictionary Manager (SDM) enables translators to build and manage multilingual dictionaries. SDM includes preparation steps for dictionary coding tasks, an online dictionary lookup (via an HTML interface), and a compiler for runtime machine translation dictionaries. It is composed of three main components: a database, HTML query form (dictionary lookup, reports, logs, import and export) and a Windows client (interactive coding tool).

10 10 TM SYSTRAN Customization Methodology Software Tools The SYSTRAN Review Manager (SRM) is a productivity tool used for the review quality assessment and maintenance of linguistic resources used combined with a SYSTRAN system.

11 11 TM SYSTRAN Customization Methodology Prerequisite 1: a formatted grammatical corpus Grammar Writing Rules Using Articles Avoiding Speech Ambiguity Using Enumeration Ensuring Subject-Verb Agreement Using Prepositions Using Infinitives at the Beginning of Sentences Using Imperatives Observing Punctuation Rules Using Main Clauses Using Subordinate Clauses Using Relative Clauses Avoiding Multiple Stacking Using Compound Words Using Capitalization Using Spelling Variations Lexical Ambiguities Disambiguation of Product Names and Menus Avoiding Lexical Ambiguities Using Compounds Format and Typographical Issues Segmentation

12 12 TM SYSTRAN Customization Methodology for MUSA Two-process fully-automatically generated Corpus: Speech Recognition (KU Leuven), Automatic Sentence Compression (CNTS) First priority Subtitles Constraints Second Priority The least possible ambiguous content Lesson learned : No prerequisite

13 13 TM SYSTRAN MT Customization Methodology Upgraded Software Tools (Client Tools v5)

14 14 TM SYSTRAN Translation Project Manager Terminology Review Not Found Words Extraction Reviewing Terminology and Sentences The Terminology Review tab in the Review window lets you identify expressions such as Not Found Words or Terminology extracted by the software.

15 15 TM SYSTRAN Translation Project Manager Terminology Review Not Found Words Extraction Examples SRC_Id these parents know measles can be dangerous, but they don't want their child to have MMR, the triple vaccine which protects them from measles, mumps and rubella. Raw MT ces parents savent la rougeole peut être dangereuse, mais ils ne veulent pas que leur enfant a MMR, le vaccin triple qui les protège contre la rougeole, les oreillons et la rubéole.

16 16 TM SYSTRAN Translation Project Manager Alternative Meanings Alternative Meanings shows alternative translations based on different meanings of a source word or expression. The Alternative Meanings tab in the Review window shows alternative meanings for expressions in SYSTRAN or User Dictionaries

17 17 TM SYSTRAN Translation Project Manager Alternative Meanings Examples SRC_Id they'd rather pay for single vaccines at 60 pounds a shot, even though the government insists MMR is safe. Raw MT ils payeraient plutôt les vaccins uniques à 60 livres un coup de feu, quoique le gouvernement exige que MMR est sûr. Customized MT ils payeraient plutôt les vaccins uniques à 60 livres une injection, quoique le gouvernement exige que MMR est sûr.

18 18 TM SYSTRAN Dictionary Manager User Dictionaries (UDs) User Dictionaries (UDs) let you increase the quality of source language analyses, which also increases the translation output for all associated target languages. UDs can be used for a number of functions, including: Automatically translating Not Found Words in the SYSTRAN dictionary. Overriding the target-language meaning of a word or expression in the SYSTRAN dictionaries, a capability that lets you customize translation output to fit specific needs. Ensuring that an expression is always treated as a unit by SYSTRAN analysis programs.

19 19 TM SYSTRAN Dictionary Manager User Dictionaries (UDs) Metrics Type of Dictionary ENFR ENEL Do Not Translate Words 3532 entries (enxx) Proper Nouns 1495 entries (enfr) 1495 entries (enel) MUSA Terminology 1443 entries (enfr) 5228 entries (enel)

20 20 TM SYSTRAN Dictionary Manager User Dictionaries (UDs) Examples SRC_ID Andrew Wakefield ignited the debate over MMR by announcing the findings of research into a group with autism and bowel disease. Raw MT Andrew Wakefield a enflammé la discussion au-dessus de MMR en annonçant les résultats de la recherche dans un groupe avec la maladie d'autism et d'entrailles. Customized MT Andrew Wakefield a enflammé la discussion au-dessus de MMR en annonçant les résultats de la recherche dans un groupe avec autisme et maladie d'entrailles.

21 21 TM SYSTRAN Translation Project Manager Source Analysis Interactive Disambiguation The Source Analysis tab in the Review window shows how the software handled source ambiguities and allows you to override the software selections.

22 22 TM SYSTRAN Translation Project Manager Source Analysis Interactive Disambiguation Examples ID 523 At first we thought it was parts of the building but it was people, literally people falling all around us. Raw MT D'abord nous avons pensé que ce faisait partie du bâtiment mais c'était les gens, peuplent littéralement la chute tout autour de nous. Customized MT Dabord nous avons pensé que cetait des fragments du bâtiment, mais cétait des gens, littéralement des gens qui tombaient autour de nous.

23 23 TM SYSTRAN Dictionary Manager Normalization Dictionaries (NDs) Normalization Dictionaries (NDs) There are two types of Normalization Dictionaries (NDs): source normalization and target normalization. Source normalization normalizes source document before translation. Target normalization adapts translation output to user needs in term of terminology consistency. It can also provide a way to replace expressions chosen by the softwares translation engine with user-defined expressions.

24 24 TM SYSTRAN Dictionary Manager Normalization Dictionaries (NDs) Examples SRC_IDs we did n't know she had measles but we do. I mean I ca n't help... Raw MT nous avons fait le n't savons qu'il a eu la rougeole mais nous faisons. Je veux dire l'aide de n't d'I ca… Customized MT via SRC Normalization nous n'avons pas su qu'il a eu la rougeole mais nous faisons. Je veux dire que je ne peux pas aider

25 25 TM SYSTRAN Translation Project Manager Sentence Review for Translation Memory Construction The Sentence Review tab in the Review window compares sentences in the source and target. You can then check the sentences you want to send to User Dictionaries, where you can work with them further in order to post-edit them and construct Translation Memories.

26 26 TM SYSTRAN Dictionary Manager Translation Memories (TMs) Translation Memory (TM) A set of translated and validated sentences that can be integrated into the translation process. Translation Memories (TMs) are databases of aligned pre-translated sentences. Unlike Dictionaries, TM entries can be formatted (for example, italic or bold) and are used by the translation engine to perform matches on full sentences in the source document. TMs are not usually created manually, but are built using SYSTRANs Translation Project Export or from TMX files.

27 27 TM SYSTRAN Dictionary Manager Translation Memories (TMs) Examples ID 370 Now people kind of started panicking and said we've got to leave no matter what. Raw MT Maintenant sorte de personnes de panique commencée et dite nous avons pour laisser n'importe ce que. Customized MT Les gens maintenant avaient lair de paniquer disant quils devaient à tout prix partir.

28 28 TM SYSTRAN Dictionary Manager Translation Memories (TMs) Translation Memory Import/Export Already existent Tmx standard translation memory exchange files can be imported/exported via SYSTRAN Dictionary Manager.


Download ppt "Www.systransoft.com1 www.systransoft.com 1 TM Translating Subtitles using Machine Translation Practices, Problems, Methodology Elsa Sklavounou, Ph. D."

Similar presentations


Ads by Google