LIN6932 Topics in Computational Linguistics

Slides:



Advertisements
Similar presentations
How to be a good teacher? What makes a good teacher?
Advertisements

Statistical Machine Translation
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
The Structure of Sentences Asian 401
Second Language Acquisition
Fall 2004 Lecture Notes #7 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Reading How can you help your children to learn to read?
Introduction to phrases & clauses
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 14b 24 August 2007.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 23 Jim Martin.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Part of speech (POS) tagging
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
C SC 620 Advanced Topics in Natural Language Processing 3/9 Lecture 14.
Lecture 1 Introduction: Linguistic Theory and Theories
Psych156A/Ling150: Psychology of Language Learning Lecture 17 Language Structure.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Machine Translation History of Machine Translation Difficulties in Machine Translation Structure of Machine Translation System Research methods for Machine.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Computing Science, University of Aberdeen1 CS4025: Machine Translation l Background, how languages differ l MT Techniques l Controlled languages For more.
Natural Language Processing Expectation Maximization.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Leksička semantika i pragmatika 3. predavanje. Machine Translation The Story of the Stone –=The Dream of the Red Chamber (Cao Xueqin 1792) Issues: (“Language.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.
Introduction to CL & NLP CMSC April 1, 2003.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
WHAT IS LANGUAGE?. INTRODUCTION In order to interact,human beings have developed a language which distinguishes them from the rest of the animal world.
CSE573 Autumn /23/98 Natural Language Processing Administrative –PS3 due today –PS4 out Wednesday, due Friday 3/13 (last day of class) special.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
NATURAL LANGUAGE PROCESSING
MORPHOLOGY. PART 1: INTRODUCTION Parts of speech 1. What is a part of speech?part of speech 1. Traditional grammar classifies words based on eight parts.
NOVEMBER 30, Announcements This week: Unit 25 and Unit 26 This Wednesday: Listening Quiz This Thursday, Unit Test Next Tuesday- Final Exam.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Introduction to Machine Translation
Changes in English 1 In this presentation we are going to look at the way other languages have influenced English and at the similarities and differences.
Approaches to Machine Translation
Introduction to Machine Translation
Statistical NLP: Lecture 13
Lecture 9: Machine Translation (I) October 25, 2005 Dan Jurafsky
Approaches to Machine Translation
Introduction to Machine Translation
Information Retrieval
Presentation transcript:

LIN6932 Topics in Computational Linguistics Lecture 9: Machine Translation Hana Filip 3/22/07 LIN 6932

Outline for MT Week Intro and a little history Language Similarities and Divergences Three classic MT Approaches Transfer Interlingua Direct Modern Statistical MT Evaluation 3/22/07 LIN 6932

What is MT? Translating a text from one language to another automatically. Fully automatic translation lies at one end of the scale and the work of the human translator armed with pencil and paper at the other. Between them are a number of possibilities for collaboration between man and computer which include word processing, terminology databases, voice recognition and translation memory systems. 3/22/07 LIN 6932

Machine Translation dai yu zi zai chuang shang gan nian bao chai you ting jian chuang wai zhu shao xiang ye zhe shang, yu sheng xi li, qing han tou mu, bu jue you di xia lei lai. Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry. 3/22/07 LIN 6932

Machine Translation 3/22/07 LIN 6932

Machine Translation The Story of the Stone also called The Dream of the Red Chamber (Cao Xueqin 1792) Issues: Word segmentation, word order Sentence segmentation: 4 English sentences to 1 Chinese Grammatical differences Chinese does not grammatically mark tense on the verb: additional words and tense marking used in E: as, turned to, had begun tou -> penetrated Zero anaphora: a gap, in a phrase or clause, that has an anaphoric function similar to a pro-form No articles Stylistic and cultural differences Bamboo tip plaintain leaf -> bamboos and plantains Ma ‘curtain’ -> curtains of her bed Rain sound sigh drop -> insistent rustle of the rain 3/22/07 LIN 6932

Not just literature Hansard: Canadian parliamentary proceeedings Hansard is the traditional name for the printed transcripts of parliamentary debates in the Westminster system of government. 3/22/07 LIN 6932

Canadian Hansard and MT The bilingual nature of the Canadian federal government requires that two equivalent Canadian Hansards be maintained: one in French and one in English. This makes it a natural parallel text, and it is often used to train French-English machine translation programs. In addition to being already translated and aligned, the size of the Hansards and the fact new material is always being added makes it an attractive corpus. Problem: translations are accurate in meaning, but they are not always literally exact 3/22/07 LIN 6932

What is MT not good for? Really hard stuff Really important stuff Literature Natural spoken speech (meetings, court reporting) Really important stuff Medical translation in hospitals 3/22/07 LIN 6932

What is MT good for? Tasks for which a rough translation is fine Web pages, email Tasks for which MT can be post-edited MT as first pass Computer-aided human translation Tasks in sublanguage domains where high-quality MT is possible FAHQT [faah-quit] (Fully Automatic High-Quality Translation) 3/22/07 LIN 6932

Sublanguage domain Weather forecasting “Cloudy with a chance of showers today and Thursday” “Low tonight 4” Can be modeled completely enough to use raw MT output Word classes and semantic features like MONTH, PLACE, DIRECTION, TIME POINT 3/22/07 LIN 6932

Some MT History 1946 Warren Weaver (mathematician, and science administrator, director of the Division of Natural Sciences at the Rockefeller Foundation, 1932-55) and Andrew D. Booth (British crystallographer) first discuss the possibility of MT in New York 1947-48 idea of dictionary-based (word-for-word) direct translation 1949 Weaver’s “Translation” memorandum popularized the MT idea 3/22/07 LIN 6932

Some MT History 1949 Weaver’s “Translation” memorandum popularized the MT idea limitations of any simplistic word-for-word approach four proposals: Approach the problem of multiple meanings by the examination of immediate context logical elements in language cryptographic methods were possibly applicable linguistic universals: “Think, by analogy, of individuals living in a series of tall closed towers, all erected over a common foundation. When they try to communicate with one another, they shout back and forth, each from his own closed tower. It is difficult to make the sound penetrate even the nearest towers, and communication proceeds very poorly indeed. But, when an individual goes down his tower, he finds himself in a great open basement, common to all the towers. Here he establishes easy and useful communication with the persons who have also descended from their towers.” 3/22/07 LIN 6932

Some MT History 1952: the first MT (“mechanical translation”) conference held at MIT, 18 MT researchers 1954 First public demo of computer translation at Georgetown University: 49 Russian sentences are translated into English using a 250-word vocabulary and 6 grammar rules. 1955-65 a number of labs take up MT 3/22/07 LIN 6932

History of MT: Pessimism 1959/1960: Bar-Hillel “Report on the state of MT in US and GB” Argued FAHQT too hard (semantic ambiguity, etc) Should work on semi-automatic instead of automatic translations His argument Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy. Only human knowledge let’s us know that ‘playpens’ are bigger than boxes, but ‘writing pens’ are smaller His claim: in order for MT to succeed, we would have to encode all of human knowledge 3/22/07 LIN 6932

Bar-Hillel’s report 1959/1960 MT research – now a “multimillion dollar affair”, as he pointed out – was, with few exceptions, set on a mistaken and unattainable goal, namely, fully automatic translation of a quality equal to that of a good human translator. This he held to be utterly unrealistic, and in his view resources were being wasted which could be more fruitfully be devoted to the development of less ambitious and more practical computer aids for translators. 3/22/07 LIN 6932

History of MT: Pessimism 1966 The ALPAC (Automatic Language Processing Advisory Committee) report Headed by John R. Pierce of Bell Labs Main Conclusion: years of research produced no useful results, all current MT work had to be post-edited Supply of human translators exceeds demand All the Soviet literature is already being translated Sponsored evaluations which showed that intelligibility and informativeness was worse than human translations Results: MT research suffered halt in federal funding for machine translation in the US Number of research labs declined Association for Machine Translation and Computational Linguistics dropped MT from its name 3/22/07 LIN 6932

History of MT 1977 METEO System, developed at the Université de Montréal was installed in Canada to translate weather forecasts from English to French 1968 Systran founded by Peter Toma, one of the oldest machine translation companies, extensive work for the United States Department of Defense and the European Commission, provides the technology for Yahoo!, AltaVista's (Babel Fish) and Google's online translation services, among others 1970’s: European focus in MT; mainly ignored in US 1980’s ideas of using AI techniques in MT (Knowledge-based MT at Carnegie Mellon University) 1990’s Commercial MT systems Statistical MT Speech-to-speech translation 3/22/07 LIN 6932

Language Similarities and Divergences Some aspects of human language are universal or near-universal, others diverge greatly. Typology: the study of systematic cross-linguistic similarities and differences What are the dimensions along with human languages vary? 3/22/07 LIN 6932

Morphological Variation Number of morphemes per word: Isolating languages Cantonese, Vietnamese: each word generally has one morpheme Polysynthetic languages Siberian Yupik (`Eskimo’): single word may have very many morphemes Degree to which morphemes are segmentable: Agglutinative languages Turkish: morphemes have clean boundaries Vs. Fusion languages Russian: single affix conflate more than one grammatical categories (e.g., a case suffix fuses number, gender and case) 3/22/07 LIN 6932

Syntactic Variation SVO (Subject-Verb-Object) languages SOV Languages English, German, French, Mandarin SOV Languages Japanese, Hindi VSO languages Irish, Classical Arabic SVO lgs generally prepositions: to Yuriko VSO lgs generally postpositions: Yuriko ni 3/22/07 LIN 6932

Segmentation Variation Not every writing system has word boundaries marked by visual cues for Chinese, Japanese, Thai, Vietnamese Some languages tend to have sentences that are quite long, closer to English paragraphs than sentences: Modern Standard Arabic, Chinese 3/22/07 LIN 6932

Lexical Divergences Word to phrases: POS divergences English “computer science” = French “informatique” POS divergences English ‘she likes/VERB to sing’ German ‘Sie singt gerne’/ADV English ‘I’m hungry’/ADJ Italian ‘Ho fame’/NOUN 3/22/07 LIN 6932

Lexical Divergences: Specificity Grammatical constraints English has gender on pronouns, Mandarin not. So translating “3rd person” from Chinese to English, need to figure out gender of the person! Similarly from English “they” to French “ils” (masc. plural) or “elles” (feminine plual) Semantic constraints English ‘brother’ Mandarin ‘gege’ (older) versus ‘didi’ (younger) English ‘wall’ German ‘Wand’ (inside) vs. ‘Mauer’ (outside) cp. die Berliner Mauer German ‘Berg’ English ‘hill’ or ‘mountain’ 3/22/07 LIN 6932

Lexical Divergence: many-to-many 3/22/07 LIN 6932

Lexical Divergence: lexical gaps Japanese: no word for ‘privacy’ English (and other languages): no single word for German ‘Schadenfreude’ (the enjoyment of another’s misfortune) English: no word for Japanese ‘oyakoko’ (something like ‘filial piety’) English ‘blue’ versus Russian ‘sinevoy’ (dark blue) and ‘goluboy’ (light blue) 3/22/07 LIN 6932

Lexicalization Patterns divergences Leonard Talmy (1985) “Lexicalization patterns: Semantic structure in lexical forms” English The bottle floated out. Manner of motion lexicalized in the verb Direction of motion lexicalized in the ‘satellite’ (here V particle) Spanish La botella salió flotando. Lit: the bottle exited floating Manner of motion lexicalized in the gerund Direction of motion lexicalized in the verb 3/22/07 LIN 6932

Lexicalization Patterns divergences Verb-framed lg: mark direction of motion on verb Romance, Arabic, Hebrew, Japanese, Tamil, Polynesian, Mayan, Bantu familiies Satellite-framed lg: mark direction of motion on satellite Crawl out, float off, jump down, walk over to, run after Rest of Indo-European (e.g., Germanic, Slavic), Hungarian, Finnish, Chinese 3/22/07 LIN 6932

Structural divergences German: Wir treffen uns am Mittwoch English: We’ll meet on Wednesday 3/22/07 LIN 6932

Thematic divergence German: Mir fällt der Termin ein English: I remember the date 3/22/07 LIN 6932

MT on the web Babelfish: Google: http://babelfish.altavista.com/ http://www.google.com/search?hl=en&lr=&client=safari&rls=en&q="1+taza+de+jugo"+%28zumo%29+de+naranja+5+cucharadas+de+azucar+morena&btnG=Search 3/22/07 LIN 6932

3 methods for MT Direct Transfer Interlingua 3/22/07 LIN 6932

Three MT Approaches: Direct, Transfer, Interlingual 3/22/07 LIN 6932

Direct Translation Proceed word-by-word through text Translating each word No intermediate structures except morphology Knowledge is in the form of Huge bilingual dictionary word-to-word translation information After word translation, can do simple reordering Adjective ordering English -> French/Spanish 3/22/07 LIN 6932

Direct MT 3/22/07 LIN 6932

Problems with direct MT German Complex reordering of words and phrases are necessary 3/22/07 LIN 6932

The Transfer Model Idea: Starting from a structural analysis, use rules about differences between languages to translate directly from one surface structure to another: syntactic transformations (adjusting word order) and lexical transfer (selecting equivalents). Steps: Analysis: Syntactically parse Source language Transfer: Rules to turn this parse into parse for Target language Generation: Generate Target sentence from parse tree and lexical transfer via lookup in the bilingual dictionary 3/22/07 LIN 6932

English to French Generally English: Adjective Noun French: Noun Adjective Note: not always true Route mauvaise ‘bad road, badly-paved road’ Mauvaise route ‘wrong road’ But is a reasonable first approximation Rule: 3/22/07 LIN 6932

Transfer rules From English SVO to Japanese SOV 3/22/07 LIN 6932

Transfer rules 3/22/07 LIN 6932

Lexical transfer Transfer-based systems also need lexical transfer rules Bilingual dictionary (like for direct MT) English home (lexical ambiguity) German nach Hause (going home) Heimat (homeland, home country) zu Hause (at home) Can list “at home <-> zu Hause” Or do Word Sense Disambiguation 3/22/07 LIN 6932

Systran: combining direct and transfer Shallow syntactic parsing Morphological analysis, POS tagging Chunking of NPs, PPs, phrases Shallow dependency parsing (subjects, passives, head-modifiers) Transfer Translation of idioms Word sense disambiguation Assigning prepositions based on governing verbs Synthesis Apply rich bilingual dictionary Deal with reordering Morphological generation 3/22/07 LIN 6932

Transfer: some problems A distinct set of transfer rules for each pair of languages Grammar and lexicon full of language-idiosyncratic generalizations Hard to build, hard to maintain 3/22/07 LIN 6932

Interlingua Intuition: Instead of lg-lg knowledge rules, use the meaning of the sentence to help Steps: translate source sentence into meaning representation generate target sentence from meaning 3/22/07 LIN 6932

Interlingua for Mary did not slap the green witch 3/22/07 LIN 6932

Direct MT: pros and cons (Bonnie Dorr) Fast Simple Cheap No translation rules hidden in lexicon Cons Unreliable Not powerful Rule proliferation Requires lots of context Major restructuring after lexical substitution 3/22/07 LIN 6932

Interlingual MT: pros and cons (B. Dorr) Avoids the proliferation of specific rules Easier to write rules Cons: Semantics is HARD Useful information lost (paraphrase) 3/22/07 LIN 6932

What makes a good translation Translators often talk about two factors we want to maximize: Faithfulness or fidelity How close is the meaning of the translation to the meaning of the original (Even better: does the translation cause the reader to draw the same inferences as the original would have) Fluency or naturalness How natural the translation is, just considering its fluency in the target language 3/22/07 LIN 6932

The impossibility of translation Hebrew “adonai roi” (= The Lord is my Shepherd) How do you translate it into a language whose culture has no sheep or shepherds Something fluent and understandable, but not faithful: “The Lord will look after me” Something faithful, but not fluent and natural “The Lord is for me like somebody who looks after animals with cotton-like hair” 3/22/07 LIN 6932

Statistical MT: Faithfulness and Fluency formalized Best-translation of a source sentence S into the target sentence T: Idea: build probabilistic models of faithfulness and fluency, and then combine these models to choose the most probable (= best) translation 3/22/07 LIN 6932

The IBM model those two factors might look familiar… Yup, it’s Bayes rule: 3/22/07 LIN 6932

Noisy channel model for statistical MT Idea: Statistical machine translation (MT) typically takes as its basis a noisy channel model in which the target language sentence, by tradition labelled E, is seen as distorted by the channel into the foreign language F. 3/22/07 LIN 6932

Noisy channel model for MT Background: The Shannon-Weaver Model of Communication 3/22/07 LIN 6932

Noisy channel model for MT Idea: Assume that the foreign (source language) input F we must translate into English is a corrupted version of some English (target language) sentence E, and that our task is to discover the hidden (target language) sentence E that generated our observation sentence F. Hidden Markov Model 3/22/07 LIN 6932

Noisy channel model for MT Given a Spanish sentence to translate (source L sentence), we treat it as the output of an English sentence (target L sentence) having gone through the noisy channel, and search for the best possible ‘source’ English sentence: I.e., the probability of the foreign sentence F given the existence of E: P(F|E) 3/22/07 LIN 6932

More formally Assume we are translating from a foreign language sentence F to an English sentence E: F = f1, f2, f3,…, fm We want a decoder which is given F and produces the most probable (= best) English sentence E-hat = e1, e2, e3,…, en E-hat = argmaxE P(E|F)1 = argmaxE P(F|E)P(E)/P(F)2 Bayes rule = argmaxE P(F|E)P(E) Translation Model Language Model 1 The conditional probability of an English sentence E, given a foreign sentence F 2 We can ignore the denominator P(F) inside the argmax since we are choosing the best English sentence for a fixed foreign sentence F, and hence P(F) is a constant. 3/22/07 LIN 6932

More formally argmaxE P(E|F) = argmaxE P(F|E)P(E) This equation leaves much unresolved concerning how the actual translation is to be performed, systems that presuppose it are derived from the early IBM Models originally designed for speech recognition at IBM and work at the word level. Called the IBM model of MT the translation process involves translating words and then rearranging them to recover the target language sentence. 3/22/07 LIN 6932

Fluency: P(T) How to measure that this sentence That car was almost crash onto me is less fluent than this one: That car almost hit me. Answer: language models (N-grams!) For example P(hit|almost) > P(almost|was) But we can use any other more sophisticated model of grammar Advantage: this is monolingual knowledge! 3/22/07 LIN 6932

Faithfulness: P(S|T) French: ça me plait [that me pleases] English: probability that each word in target sentence would generate each word in source sentence. French: ça me plait [that me pleases] English: that pleases me - most faithful I like it - most fluent How to quantify faithfulness? Intuition: degree to which words in one sentence are plausible translations of words in other sentence 3/22/07 LIN 6932

Faithfulness P(S|T) Need to know, for every target language word, probability of it mapping to every source language word. How do we learn these probabilities? Parallel texts! two texts that are translations of each other 3/22/07 LIN 6932

Word Alignment All statistical translation models are based on the idea of a word alignment French - English word alignment 3/22/07 LIN 6932

Word Alignment The IBM models require that each French word comes from exactly one English word: one-to-one and one-to-many alignments sanctioned Many-to-many and many-to-one alignments disallowed by basic MT models We can represent the above alignment by giving the index number of the English word that the French word comes from: A = 2,3,4,5,6,6,6. 3/22/07 LIN 6932

Word Alignment TRAINING ALIGNMENT MODELS All statistical translation models are trained using a large parallel corpus. A parallel corpus, parallel text, or bitext is a text that is available in two languages. For example, the proceedings of the Canadian parliament are kept in both French and English. Each sentence spoken in parliament is translated, producing a volume with running text in both languages. 3/22/07 LIN 6932

Word Alignment First step: Sentence alignment Figuring out which source language sentence maps to which target language sentence Second step: Word alignment Figuring out which source language word maps to which target language word for each sentence pair (F, E). 3/22/07 LIN 6932

Back to Faithfulness and Fluency Job of the faithfulness model P(S|T) is to model “bag of words”; e.g., which words align from English to Spanish, when translating from Spanish to English. P(S|T) does not have to worry about lg particular facts about Spanish word order: that’s the job of P(T) (language model) P(T) can do Bag generation: rearrange the words so that they recover the correct word order of the target sentence (from Kevin Knight, USC/Information Sciences Institute) - 3/22/07 LIN 6932

P(T) and bag generation: problem Problem: the ‘bag of words’ statistical MT does not model relations among words How about: loves Mary John 3/22/07 LIN 6932

Phrase-Based MT Recently there has been considerable interest in MT systems based not upon words, but rather syntactic phrases Such MT systems perform the translation by assuming that during the training phase the target language (but not the source language) specifies not just the words, but rather the complete parse of the sentence. Eugene Charniak, Kevin Knight and Kenji Yamada (2003) “Syntax-based Language Models for Statistical Machine Translation” 3/22/07 LIN 6932