Lecture 9: Machine Translation (I) October 25, 2005 Dan Jurafsky

Slides:



Advertisements
Similar presentations
The Structure of Sentences Asian 401
Advertisements

Fall 2004 Lecture Notes #7 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Machine Translation Introduction to MT. Dan Jurafsky Machine Translation Fully automatic Helping human translators Enter Source Text: Translation from.
Introduction to Semantics and Pragmatics. LING NLP 2 NLP tends to focus on: Syntax – Grammars, parsers, parse trees, dependency structures.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 14b 24 August 2007.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 23 Jim Martin.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Translation Divergence LING 580MT Fei Xia 1/10/06.
Linguistics II Syntax. Rules of how words go together to form sentences What types of words go together How the presence of some words predetermines others.
Creation of a Russian-English Translation Program Karen Shiells.
LIN6932 Topics in Computational Linguistics
Computing Science, University of Aberdeen1 CS4025: Machine Translation l Background, how languages differ l MT Techniques l Controlled languages For more.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
323 Morphology The Structure of Words 1.1 What is Morphology? Morphology is the internal structure of words. V: walk, walk+s, walk+ed, walk+ing N: dog,
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
Leksička semantika i pragmatika 3. predavanje. Machine Translation The Story of the Stone –=The Dream of the Red Chamber (Cao Xueqin 1792) Issues: (“Language.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Artificial Intelligence: Natural Language
WHAT IS LANGUAGE?. INTRODUCTION In order to interact,human beings have developed a language which distinguishes them from the rest of the animal world.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
Natural Language Processing Chapter 2 : Morphology.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Syntax 1. 2 In your free time Look at the diagram again, and try to understand it. Phonetics Phonology Sounds of language Linguistics Grammar MorphologySyntax.
SYNTAX.
Machine Translation Divergences: A Formal Description and Proposed Solution Bonnie J. Dorr University of Maryland Presented by: Soobia Afroz.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
MORPHOLOGY. PART 1: INTRODUCTION Parts of speech 1. What is a part of speech?part of speech 1. Traditional grammar classifies words based on eight parts.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Introduction to Machine Translation
Lecture 2: Categories and Subcategorisation
Linguistics 1 Syntax Week 2 Lectures 3 & 4.
Eliciting a corpus of word-aligned phrases for MT
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Approaches to Machine Translation
Introduction to Machine Translation
Syntax 1.
Sentences as Arguments
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Lecture -3 Week 3 Introduction to Linguistics – Level-5 MORPHOLOGY
عمادة التعلم الإلكتروني والتعليم عن بعد
Statistical NLP: Lecture 3
SYNTAX.
Representation of Actions as an Interlingua
Part I: Basics and Constituency
2008/09/17: Lecture 4 CMSC 104, Section 0101 John Y. Park
Syntax.
CSC 594 Topics in AI – Applied Natural Language Processing
Machine Learning in Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
Classroom Language.
Machine Translation Nov 8, 2006
Natural Language Processing
UMBC CMSC 104 – Section 01, Fall 2016
Approaches to Machine Translation
Introduction to Machine Translation
Linguistic Essentials
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Artificial Intelligence 2004 Speech & Natural Language Processing
Introduction to Linguistics
Editing Process: English 10 Spoken Language
Presentation transcript:

Lecture 9: Machine Translation (I) October 25, 2005 Dan Jurafsky Thanks to Bonnie Dorr for some of these slides!! 11/18/2018

Outline for MT Week Intro and a little history Language Similarities and Divergences Four main MT Approaches Transfer Interlingua Direct Statistical Evaluation 11/18/2018

What is MT? Translating a text from one language to another automatically. 11/18/2018

Machine Translation Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry. 11/18/2018

Machine Translation The Story of the Stone =The Dream of the Red Chamber (Cao Xueqin 1792) Issues: Breaking up into words Breaking up into sentences Zero-anaphora Penetrate -> penetrated Bamboo tip plaintain leaf -> bamboos and plantains Curtain -> curtains of her bed Rain sound sigh drop -> insistent rustle of the rain 11/18/2018

What is MT not good for? Really hard stuff Really important stuff Literature Natural spoken speech (meetings, court reporting) Really important stuff Medical translation in hospitals, 911 11/18/2018

What is MT good for? Tasks for which a rough translation is fine Web pages, email Tasks for which MT can be post-edited MT as first pass “Computer-aided human translation Tasks in sublanguage domains where high-quality MT is possible 11/18/2018

Sublanguage domain Weather forecasting “Cloudy with a chance of showers today and Thursday” “Low tonight 4” Can be modeling completely enough to use raw MT output Word classes and semantic features like MONTH, PLACE, DIRECTION, TIME POINT 11/18/2018

MT History 1946 Booth and Weaver discuss MT at Rockefeller foundation in New York; 1947-48 idea of dictionary-based direct translation 1949 Weaver memorandum popularized idea 1952 all 18 MT researchers in world meet at MIT 1954 IBM/Georgetown Demo Russian-English MT 1955-65 lots of labs take up MT 11/18/2018

History of MT: Pessimism 1959/1960: Bar-Hillel “Report on the state of MT in US and GB” Argued FAHQT too hard (semantic ambiguity, etc) Should work on semi-automatic instead of automatic His argument Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy. Only human knowledge let’s us know that ‘playpens’ are bigger than boxes, but ‘writing pens’ are smaller His claim: we would have to encode all of human knowledge 11/18/2018

History of MT: Pessimism The ALPAC report Headed by John R. Pierce of Bell Labs Conclusions: Supply of human translators exceeds demand All the Soviet literature is already being translated MT has been a failure: all current MT work had to be post-edited Sponsored evaluations which showed that intelligibility and informativeness was worse than human translations Results: MT research suffered Funding loss Number of research labs declined Association for Machine Translation and Computational Linguistics dropped MT from its name 11/18/2018

History of MT 1976 Meteo, weather forecasts from English to French Systran (Babelfish) been used for 40 years 1970’s: European focus in MT; mainly ignored in US 1980’s ideas of using AI techniques in MT (KBMT, CMU) 1990’s Commercial MT systems Statistical MT Speech-to-speech translation 11/18/2018

Language Similarities and Divergences Some aspects of human language are universal or near-universal, others diverge greatly. Typology: the study of systematic cross-linguistic similarities and differences What are the dimensions along with human languages vary? 11/18/2018

Morphological Variation Isolating languages Cantonese, Vietnamese: each word generally has one morpheme Vs. Polysynthetic languages Siberian Yupik (`Eskimo’): single word may have very many morphemes Agglutinative languages Turkish: morphemes have clean boundaries Vs. Fusion languages Russian: single affix may have many morphemes 11/18/2018

Syntactic Variation SVO (Subject-Verb-Object) languages SOV Languages English, German, French, Mandarin SOV Languages Japanese, Hindi VSO languages Irish, Classical Arabic SVO lgs generally prepositions: to Yuriko VSO lgs generally postpositions: Yuriko ni 11/18/2018

Segmentation Variation Not every writing system has word boundaries marked Chinese, Japanese, Thai, Vietnamese Some languages tend to have sentences that are quite long, closer to English paragraphs than sentences: Modern Standard Arabic, Chinese 11/18/2018

Inferential Load Some languages require the hearer to do more “figuring out” of who the various actors in the various events are: Japanese, Chinese, Other languages are pretty explicit about saying who did what to whom. English 11/18/2018

Inferential Load (2) All noun phrases in blue do not appear in Chinese text … But they are needed for a good translation 11/18/2018

Lexical Divergences Word to phrases: POS divergences English “computer science” = French “informatique” POS divergences Eng. ‘she likes/VERB to sing’ Ger. Sie singt gerne/ADV Eng ‘I’m hungry/ADJ Sp. ‘tengo hambre/NOUN 11/18/2018

Lexical Divergences: Specificity Grammatical constraints English has gender on pronouns, Mandarin not. So translating “3rd person” from Chinese to English, need to figure out gender of the person! Similarly from English “they” to French “ils/elles” Semantic constraints English `brother’ Mandarin ‘gege’ (older) versus ‘didi’ (younger) English ‘wall’ German ‘Wand’ (inside) ‘Mauer’ (outside) German ‘Berg’ English ‘hill’ or ‘mountain’ 11/18/2018

Lexical Divergence: one-to-many 11/18/2018

Lexical Divergence: lexical gaps Japanese: no word for privacy English: no word for Cantonese ‘haauseun’ or Japanese ‘oyakoko’ (something like `filial piety’) English ‘cow’ versus ‘beef’, Cantonese ‘ngau’ 11/18/2018

Event-to-argument divergences English The bottle floated out. Spanish La botella salió flotando. The bottle exited floating Verb-framed lg: mark direction of motion on verb Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian, Mayan, Bantu familiies Satellite-framed lg: mark direction of motion on satellite Crawl out, float off, jump down, walk over to, run after Rest of Indo-European, Hungarian, Finnish, Chinese 11/18/2018

Structural divergences G: Wir treffen uns am Mittwoch E: We’ll meet on Wednesday 11/18/2018

Head Swapping E: X swim across Y S: X crucar Y nadando E: I like to eat G: Ich esse gern E: I’d prefer vanilla G: Mir wäre Vanille lieber 11/18/2018

Thematic divergence Y me gusto I like Y G: Mir fällt der Termin ein E: I forget the date 11/18/2018

Divergence counts from Bonnie Dorr 32% of sentences in UN Spanish/English Corpus (5K) Categorial X tener hambre Y have hunger 98% Conflational X dar puñaladas a Z X stab Z 83% Structural X entrar en Y X enter Y 35% Head Swapping X cruzar Y nadando X swim across Y 8% Thematic X gustar a Y Y likes X 6% 11/18/2018

MT on the web Babelfish: http://babelfish.altavista.com/ 11/18/2018

3 methods for MT Direct Transfer Interlingua 11/18/2018

Three MT Approaches: Direct, Transfer, Interlingual This slide from Bonnie Dorr! Original metaphor due to Bernard Vauquois Semantic Composition Semantic Decomposition Semantic Structure Semantic Structure Semantic Analysis Semantic Generation Semantic Transfer Syntactic Structure Syntactic Structure Syntactic Transfer Syntactic Analysis Syntactic Generation Word Structure Word Structure Direct Morphological Analysis Morphological Generation Source Text Target Text 11/18/2018

The Transfer Model Idea: apply contrastive knowledge, i.e., knowledge about the difference between two languages Steps: Analysis: Syntactically parse Source language Transfer: Rules to turn this parse into parse for Target language Generation: Generate Target sentence from parse tree 11/18/2018

Transfer architecture 11/18/2018

English to French Generally English: Adjective Noun French: Noun Adjective Note: not always true Route mauvaise ‘bad road, badly-paved road’ Mauvaise route ‘wrong road’) But is a reasonable first approximation Rule: 11/18/2018

Example: English to Japanese Transfer Rule for Existential-there: delete “there” and convert 4th constituent to relative clause modifying the noun Rule for relative clauses: reverse the order of them Syntax is done: apply lexical transfer. 11/18/2018

English to Japanese Transfer From “niqa no teire o suru ojiisan ita” Add “ga” to mark subject Chose verb to agree with subject Inflect verbs Linearize tree: Niwa no teire o shite ita ojiisan ga ita Garden GEN upkeep OBJ do PASTPROG old man SUBJ was “There was an old man gardening” 11/18/2018

E-to-J Transfer: rules used Existential-There-Sentence There1 Verb2 NP3 Postnominal4 -> (NP -> NP3 Relative-Clause4) Verb2 NP -> Np1 Relative-Clause2 NP -> Relative-Clause2 NP1 11/18/2018

Lexical Transfer Man: Can treat like lexical ambiguity, Ojisan ‘old man’ Man is the only linguistic animal -> Ningen ‘man, human being’ Or Hito ‘person, persons’ Can treat like lexical ambiguity, Disambiguate during parsing 11/18/2018

Transfer: some problems N2 sets of transfer rules! Grammar and lexicon full of language-specific stuff Hard to build, hard to maintain 11/18/2018

MT Method 2: Interlingua Intuition: Instead of lg-lg knowledge rules, use the meaning of the sentence to help Steps: 1) translate source sentence into meaning representation 2) generate target sentence from meaning. 11/18/2018

Interlingua for there was an old man gardening EVENT: GARDENING AGENT: MAN NUMBER SG DEFINITENESS INDEF ASPECT: PROGRESSIVE TENSE: PAST 11/18/2018

Interlingua Idea is that some of the MT work that we need to do is part of other NLP tasks E.g., disambiguating E:book S:‘libro’ from E:book S:‘reservar’ So we could have concepts like BOOKVOLUME and RESERVE and solve this problem once for each language 11/18/2018

Vauqois diagram 11/18/2018

Direct Translation Idea: more robust, word-specific models Start with a Source language sentence Write little transformations, directly on words, to turn it into a Target language sentence. 11/18/2018

Direct MT J-to-E Watashihatsukuenouenopenwojonniageta. 1. Morphological analysis Watashi h tsukue no ue no pen wo jon ni ageru PAST 2) lexical transfer of content words I ha desk no ue no pen wo John ni give PAST 3) various preposition work I ha pen on desk wo John to give PAST. 4) SVO rearrangements I give PAST pen on desk John to. 5) miscellany I give PAST the pen on the desk to John. 6) morphological generation I gave the pen on the desk to John. 11/18/2018

Direct MT stage 2, (ex. from Panov 1960 via Hutchins 1986) Function direct-translate-much/many If preceding word is ‘how’ Return skol’ko Else if preceding word is ‘as’ Return skol’ko zhe Else if word is ‘much’ If preceding words is ‘very’; Return nil (not translated) Else if following word is a noun Return ‘mnogo’ Else /*word is many*/ If preceding word is PREP and following is NOUN Return ‘mnogii’ Else return ‘mnogo’ 11/18/2018

Three MT Approaches: Direct, Transfer, Interlingual This slide from Bonnie Dorr! Original metaphor due to Bernard Vauquois Semantic Composition Semantic Decomposition Semantic Structure Semantic Structure Semantic Analysis Semantic Generation Semantic Transfer Syntactic Structure Syntactic Structure Syntactic Transfer Syntactic Analysis Syntactic Generation Word Structure Word Structure Direct Morphological Analysis Morphological Generation Source Text Target Text 11/18/2018

3 methods pros and cons Thanks to Bonnie Dorr! 11/18/2018

Direct MT: pros and cons (Bonnie Dorr) Fast Simple Cheap No translation rules hidden in lexicon Cons Unreliable Not powerful Rule proliferation Requires lots of context Major restructuring after lexical substitution 11/18/2018

Interlingual MT: pros and cons (B. Dorr) Avoids the N2 problem Easier to write rules Cons: Semantics is HARD Useful information lost (paraphrase) 11/18/2018

Summary Intro and a little history Language Similarities and Divergences Four main MT Approaches Transfer Interlingua Direct Statistical Evaluation 11/18/2018