1 Rules Based Machine Translation Fred Hollowood Consultant RBMT and CL.

Slides:



Advertisements
Similar presentations
© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
Advertisements

Controlled Language in action for MT Johann Roturier May 2009.
Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
Grammar and Sentences “It is impossible ..to teach English grammar in the schools for the simple reason that no one knows exactly what it is” Government.
APA Style Grammar. Verbs  Use active rather than passive voice, select tense and mood carefully  Poor: The survey was conducted in a controlled setting.
Translation in the Community LRC Localisation in the Cloud Jason Rickard Principal Product Manager, Community.
Post-Editing – Professional translation service redefined
 Quail -> quail  Radius -> radii  Phenomenon -> phenomena  Medium -> media  Cactus -> cacti  Syllabus -> syllabi  Trout -> trout  Fish -> fish.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Translation Anna Sågvall Hein Mösg F
REVIEW OF GRAMMAR Wrighting good meens you got to follow all the ruls; like speling, good, propper, punctuashun and coreckt grammar.
Computer Engineering 294 R. Smith Writing Skills 03/ Breaking Old Habits Generally we follow old habits. There are hundreds of rules to learn. –
KS1 English Parent Workshop January 2015
KS2 English Parent Workshop January 2015
Linda Mitchell Evaluating Community Post-Editing - Bridging the Gap between Translation Studies and Social Informatics Linda Mitchell PhD student.
Style, Grammar and Punctuation
Automating Translation in the Localisation Factory An Investigation of Post-Editing Effort Sharon O’Brien Dublin City University.
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Grammar Skills Workshop
WRITING SKILLS IN ENGLISH RULES OF GRAMMAR Vidya Hariharan Asst. Prof. Dept. of English.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
MECHANICS OF WRITING C.RAGHAVA RAO.
Our School Curriculum  To prepare pupils for the opportunities, responsibilities and experiences of later life  To share the very best of what has been.
Getting the Language Right ITSW 1410 Presentation Media Software Instructor: Glenda H. Easter.
© 2006 SOUTH-WESTERN EDUCATIONAL PUBLISHING 11th Edition Hulbert & Miller Effective English for Colleges Chapter 9 SENTENCES: ELEMENTS, TYPES, AND STRUCTURES.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Case Study Summary Link Translation entered a partner agreement with Autodesk to provide translation solutions integrating human and machine translation.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Grammar Review Parts of Speech Sentences Punctuation.
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
LANGUAGE ARTS LA WORKS UNIT 3 REVIEW STUDY GUIDE.
C HAPTER 11 Grammar Fundamentals. T HE P ARTS OF S PEECH AND T HEIR F UNCTIONS Nouns name people, places things, qualities, or conditions Subject of a.
Quick Punctuation Guide
Prof Rickus’ Rules of Writing “The Elements of Style” 4th Edition Strunk and White An Excellent Writing Reference:
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Morland Area C of E Primary
PowerPoint Presentation by Charlie Cook The University of West Alabama appendix D Copyright © 2009 South-Western/Cengage Learning. All rights reserved.
Year 2 Grammar afternoon Tuesday 2 nd February 2016.
Expectations in English. All year groups have heightened expectations End of year 2 Punctuation- Use of capital letters, full stops, question marks, and.
Why is grammar important? man eating shark man-eating shark I’m sorry; I love you I’m sorry I love you.
Peer Review Overview Meeting [Date] [Product name]
© Worth Weller; M. Stadnycki. Your essays must be your own words with your own thoughts and your own voice. However, quoting sources in your essays: 
Use Lists and Format Them to Make Them Clear and Correct Writing Clear Business Reports The Business Writing Center
Grammar, Vocabulary and Punctuation A Summary Guide to the Changes and Expectations for 2015/16 Parent workshop June 2016.
INFORMATION FOR PARENTS AUTUMN 2014 SPELLING, PUNCTUATION AND GRAMMAR.
SPAG Parent Workshop April Agenda English and the new SPaG curriculum How to help your children at home How we teach SPaG Sample questions from.
Welcome To Our Parents Meeting About SPAG!
KS2 English Parent Workshop 21st October 2016
What it means for your child.
Spelling Punctuation Grammar.
Grammar for Parents 20th October 2016 Welcome! Questions are welcome…
KS1 SPaG Parent Workshop October 2016
Adapted from Kaplan SAT Premier 2017 Chapter 23
KS1 English at Tregolls.
Analytical Essay Proper Punctuation.
Year 6 Objectives: Writing
Appendix A: Basic Grammar and Punctuation Reference
KS2 SPaG Parent Workshop January 2015
Fundamentals of Writing
Write the vocabulary words in your personal dictionary.
How Do We Translate? Methods of Translation The Process of Translation.
What is SPaG? pelling unctuation nd rammar. What is SPaG? pelling unctuation nd rammar.
A.C.T. English test.
NOUNS person, place, thing, or idea
Translation Problems.
COORDINATION AND SUBORDINATION
Grammar! (Hurray!).
The SAT Writing and Language Test
TECHNICAL REPORTS WRITING
Key Stage 1 Grammar.
Presentation transcript:

1 Rules Based Machine Translation Fred Hollowood Consultant RBMT and CL

Sample Agenda RBMT and CL 2 Introduction 1 Rules Based Machine Translation 2 Post-Editing 3 Quality Measurement 4 Controlled Language 5

Introduction The Aim Bring rapid, cost-effective translation to Symantec’s product and service divisions Connect Symantec’s CMS to translation technologies Metrics on the reduction of translation costs and time to market The Approach Structure source content so it accommodates MT Use a language checker to monitor source grammar Promote terminology as a key process and deliverable Proactive rather than reactive Define measures to monitor and drive productivity GTM, Meteor, BLEU Work with post-editors to ensure a win-win RBMT and CL 3 Technology Initiative - The Aim

Rules Based Machine Translation RBMT and CL 4 SL Text Analysis SL Lexicon & Grammars Transfer SL->TL Lexical & Structural Rules Synthesis TL Text TL Lexicon & Grammars Flowchart of Rule-Based Machine Translation (RBMT)

MT Process Overview RBMT and CL 5 Controlled Language Authoring Automated Pre-processing User Dictionary Translation System Normalisation Dictionary Automated Post-processing Human Post-Editing Systran Engine Remote Human Activity System Control Phases Text Processing       

Post-Editing Fundamentally same relationship as with traditional vendor Increased daily throughput expected for Post Edited content (6-8k Vs 2.5k p/day) Style requirements have been critically reviewed in the light of PE E.g. stylistic inconsistencies are acceptable for post-edited content RBMT and CL 6

Measurement RBMT and CL 7

Metrics based on Comprehensibility RBMT and CL 8 ScoreCriteria Excellent MT output (E) (4) Read the MT output first. Then read the Source Text (ST). Your understanding of the MT output is not improved by the reading of the ST because the MT output is satisfactory and would not need to be modified. An end-user who does not have access to the ST would be able to understand the MT output. Good MT output (G) (3) Read the MT output first. Then read the source text. Your understanding of the MT output is not improved by the reading of the ST even though the MT output contains minor grammatical mistakes. An end-user who does not have access to the source text could possibly understand the MT output. Medium MT output (M) (2) Read the MT output first. Then read the source text. Your understanding of the MT output is improved by the reading of the ST, due to significant errors in the MT output. An end-user who does not have access to the source text could only get the gist of the MT output. Poor MT output (P) (1) Read the MT output first. Then read the source text. Your understanding only derives from the reading of the ST, as you could not understand the MT output. An end-user who does not have access to the source text would not be able to understand the MT output at all.

Quality by Human Inspection RBMT and CL 9

GTM Scoring RBMT and CL 10 From the machine From the post-editor

Quality Metrics by Language RBMT and CL 11 Project Scores by Language French: 73% Spanish: 68% Italian: 59% German:57%

Example Style rules Avoid using a colon after a drive letter Avoid “he”, “she”, “he/she”, and “s/he” Use numerals for all measurements over 10 Use the serial comma Do not use more than two adverbs or adjectives in a series Keep the subject and verb close to each other early in a sentence Avoid meaningless openers Avoid progressive tense when describing product use Do not use future when describing product use Make positive statements that tell users what to do or what they need to know Use sentence-style capitalization for bulleted lists Use a colon at the end of a sentence to introduce a bulleted list Punctuate imperative sentences in bulleted lists Use number × number Use a hyphen in a unit Repeat the unit of measure RBMT and CL 12

CL rules based on CDG Avoid using the passive voice Do not use more than 25 words in a sentence (original recommendation was 20) Use relative pronouns Use complementizers (“that”) Avoid unnecessary words (such as “basic” or “just”) Do not use 'this' or 'that' when they are not followed by a noun Place all non-translatable text on its own line (programming code snippets) RBMT and CL 13

CL rules for MT Do not use slashes to list lexical items Do not write the full name of each operating system Avoid –ing words Use a noun at the start of subordinate clause Repeat the head noun in ambiguous coordinated structures Use a hyphen to indicate the first part of a compound Use articles in specific contexts (for disambiguation) Keep both parts of a two-part verb together Use "could" with "if“ Avoid parenthetical expressions in the middle of a sentence RBMT and CL 14

Examples of CL Violation Keep both parts of a two-part verb together This document gives directions to turn scanning on or off.  Dieses Dokument gibt Richtungen zum Umdrehung - Prüfung an oder weg.  Ce document donne des directions à l'analyse du courrier électronique de tour en fonction ou hors fonction. This document gives directions to turn on or turn off scanning.  Dieses Dokument gibt Richtungen, -Prüfung zu aktivieren oder zu deaktivieren.  Ce document donne des directions pour activer ou désactiver l'analyse du courrier électronique. RBMT and CL 15

Lessons Learned Strict implementation when there is: New content Little leverage Time Rules can be context-sensitive Different results depending on client application May not always flag tag problems Language-specific rules Probably best implemented as: Pre-processing step Normalization dictionaries CL + MT is not sufficient Terminology work to update dictionaries PE when specific qualify standard is required RBMT and CL 16

Thank you! Copyright © 2010 FRED Hollowood CONSULTING. All rights reserved. This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice. RBMT and CL 17 Fred Hollowood