CSCI 5832 Natural Language Processing

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Clustering Beyond K-means

Machine Translation- 4 Autumn 2008 Lecture Sep 2008.

Machine Translation Introduction to Statistical MT.

Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.

Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,

Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.

A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.

CSCI 5582 Artificial Intelligence

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.

Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)

1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.

Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.

MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.

Natural Language Processing Expectation Maximization.

Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:

English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.

Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Machine Translation Course 5 Diana Trandab ă ț Academic year:

12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.

Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.

Machine Translation- 2 Autumn 2008 Lecture 17 4 Sep 2008.

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.

S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.

(Statistical) Approaches to Word Alignment

NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Natural Language Processing Lecture 23—12/1/2015 Jim Martin.

Computational Linguistics Seminar LING-696G Week 6.

Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.

Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.

Statistical Machine Translation Part II: Word Alignments and EM

CSE 517 Natural Language Processing Winter 2015

Alexander Fraser CIS, LMU München Machine Translation

Statistical NLP Spring 2011

Statistical NLP: Lecture 13

Statistical Machine Translation

CSCI 5832 Natural Language Processing

CSCI 5832 Natural Language Processing

Statistical Machine Translation Part III – Phrase-based SMT / Decoding

CSCI 5832 Natural Language Processing

CSCI 5832 Natural Language Processing

Statistical Machine Translation

Three classic HMM problems

N-Gram Model Formulas Word sequences Chain rule of probability

Expectation-Maximization Algorithm

Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.

Machine Translation and MT tools: Giza++ and Moses

Statistical Machine Translation Papers from COLING 2004

Statistical Machine Translation Part IIIb – Phrase-based Model

Machine Translation(MT)

Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn

Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky

Machine Translation and MT tools: Giza++ and Moses

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Statistical Machine Translation Part VI – Phrase-based Decoding

CSCI 5582 Artificial Intelligence

Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011

CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

CSCI 5832 Natural Language Processing Jim Martin Lecture 26 12/6/2018

Today 4/29 More MT 12/6/2018

Bayes Rule/Noisy Channel Garbled English Spanish English Translation Model P(s|e) Language Model P(e) Que hambre tengo yo I am so hungry Decoding algorithm argmax P(e) * P(s|e) e Given a source sentence s, the decoder should consider many possible translations … and return the target string e that maximizes P(e | s) By Bayes Rule, we can also write this as: P(e) x P(s | e) / P(s) and maximize that instead. P(s) never changes while we compare different e’s, so we can equivalently maximize this: P(e) x P(s | e) 12/6/2018

Three Sub-Problems of Statistical MT Language model Given an English string e, assigns P(e) by formula good English string -> high P(e) random word sequence -> low P(e) Translation model Given a pair of strings <f,e>, assigns P(f | e) by formula <f,e> look like translations -> high P(f | e) <f,e> don’t look like translations -> low P(f | e) Decoding algorithm Given a language model, a translation model, and a new sentence f … find translation e maximizing P(e) * P(f | e) 12/6/2018

Parts List We need probabilities for n (x|y) The probability that word y will yield x outputs in the translation… (fertility) p The probability of a null insertion t The actual word translation probability table d(j|i) the probability that a word at position i will make an appearance at position j in the translation 12/6/2018

Parts List Every one of these can be learned from a sentence aligned corpus… Ie. A corpus where sentences are paired but nothing else is specified And the EM algorithm 12/6/2018

Word Alignment … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … Inherent hidden structure revealed by EM training! 12/6/2018

EM: Worked out example Focus only on the word translation probs How many alignments are there for each of these sentence pairs? … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 12/6/2018

EM: Worked out Knight example Focus only on the word translation probs How many alignments are there? … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 2 6 2 12/6/2018

EM: Step 1 Make up some numbers for the parameters of interest. In this case, just the word translation probabilities. (la|the) (m|the) (b|the) (f|the) (la|house) (m|house) (b|house) (la|blue) (m|blue) (b|blue) (la|flower) (f|flower) 12/6/2018

Reminder P(la | the) is P(la aligned with the)/P(the) Which is Count(la aligned with the)/Count(the) in a word-aligned corpus. Which we don’t have. 12/6/2018

EM: Step 1 Make up some numbers for the parameters of interest. In this case, just the translation probs. (la|the) 1/4 (m|the) 1/4 (b|the) 1/4 (f|the) 1/4 (la|house) 1/3 (m|house) 1/3 (b|house) 1/3 (la|blue) 1/3 (m|blue) 1/3 (b|blue) 1/3 (la|flower) 1/2 (f|flower) 1/2 12/6/2018

EM: Step 2 Make some simple assumptions and produce some normalized alignment probabilities la maison bleue the blue house la maison the house la maison the house ¼ * 1/3 = 1/12 ¼ * 1/3 = 1/12 ¼ * 1/3 * 1/3 = 1/36 12/6/2018

EM: Step 2 (normalize) Make some simple assumptions and produce some normalized alignment probabilities la maison the house la maison the house la maison bleue the blue house ¼ * 1/3 = 1/12 ¼ * 1/3 = 1/12 ¼ * 1/3 * 1/3 = 1/36 (1/12) / (2/12) = 1/2 (1/12) / (2/12) = 1/2 1/6 For each 12/6/2018

EM: Step 3 Now that we have the probability of each alignment we can go back and count the evidence in the alignments for each translation pair and prorate them based on the alignments they come from. 12/6/2018

EM: Step 3 Now that we have the probability of each alignment we can go back and count the evidence in the alignments for each translation pair and prorate them based on the alignments they come from. Huh? Let’s just look at (la | the). What evidence do we have? 12/6/2018

EM: Step 3 Evidence for (la|the) … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 2 1 1 12/6/2018

EM: Step 3 Evidence for (la|the) … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 2 1 1 1* ½ 2* 1/6 1* ½ 12/6/2018

EM: Step 3 Evidence for (la|the) … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 2 1 1 1* ½ 2* 1/6 1* ½ 8/6 = Discounted count for (la|the) 12/6/2018

EM: Step 3 Do that for the other (?|the) and normalize (la| the) = 8/6 (m| the) = 5/6 (b| the) = 2/6 (f | the) = 3/6 (la| the) = 8/6 / 18/6 (m| the) = 5/6 / 18/6 (b| the) = 2/6 /18/6 (f | the) = 3/6 / 18/6 (la| the) = .44 (m| the) = .27 (b| the) = .11 (f | the) = .16 12/6/2018

EM: Step 4 Do that for all the words in the table Recompute the alignment probabilities using the new word translation probabilities Recompute the fractional counts Normalize to get new word translation probs Iterate until done When you’re done you can use the numbers to get the most probable alignment. From which you can read off all the parameters you need. 12/6/2018

EM/Alignment Ok. I know this seems weird We need some parameters which we don’t have We can get them from a word-aligned corpus So we make up some parameters to get the alignment and then use that alignment to get the right numbers. 12/6/2018

Parts List Given a sentence alignment we can induce a word alignment Given that word alignment we can get the p, t, d and n parameters we need for the model. Ie. We can argmax P(e|f) by max over P(f|e)*P(e)… and we can do that by iterating over some large space of possibilities. 12/6/2018

Break I will try to cover a subset of the material in Chapter 23 next week. 12/6/2018

Intuition of phrase-based translation (Koehn et al. 2003) Generative story has three steps Group words into phrases Translate each phrase Move the phrases around 12/6/2018

Generative story again Group English source words into phrases e1, e2, …, en Translate each English phrase ei into a Spanish phrase fj. The probability of doing this is (fj|ei) Then (optionally) reorder each Spanish phrase We do this with a distortion probability A measure of distance between positions of a corresponding phrase in the 2 lgs. “What is the probability that a phrase in position X in the English sentences moves to position Y in the Spanish sentence?” 12/6/2018

Distortion probability The distortion probability is parameterized by ai-bi-1 Where ai is the start position of the foreign (Spanish) phrase generated by the ith English phrase ei. And bi-1 is the end position of the foreign (Spanish) phrase generated by the I-1th English phrase ei-1. We’ll call the distortion probability d(ai-bi-1). And we’ll have a really stupid model: d(ai-bi-1) = |ai-bi-1| Where  is some small constant. 12/6/2018

Final translation model for phrase-based MT Let’s look at a simple example with no distortion 12/6/2018

Phrase-based MT Language model P(E) Translation model P(F|E) How to train the model Decoder: finding the sentence E that is most probable 12/6/2018

Training P(F|E) What we mainly need to train is (fj|ei) Suppose we had a large bilingual training corpus A bitext In which each English sentence is paired with a Spanish sentence And suppose we knew exactly which phrase in Spanish was the translation of which phrase in the English We call this a phrase alignment If we had this, we could just count-and-divide: 12/6/2018

But we don’t have phrase alignments What we have instead are word alignments: (actually the word alignments we have are more restricted than this, as we’ll see in two slides) 12/6/2018

Getting phrase alignments To get phrase alignments: We first get word alignments Then we “symmetrize” the word alignments into phrase alignments 12/6/2018

How to Learn the Phrase Translation Table? One method: “alignment templates” (Och et al, 1999) Start with word alignment, build phrases from that. Maria no dió una bofetada a la bruja verde This word-to-word alignment is a by-product of training a translation model like IBM-Model-3. This is the best (or “Viterbi”) alignment. Mary did not slap the green witch 12/6/2018

How to Learn the Phrase Translation Table? One method: “alignment templates” (Och et al, 1999) Start with word alignment, build phrases from that. Maria no dió una bofetada a la bruja verde This word-to-word alignment is a by-product of training a translation model like IBM-Model-3. This is the best (or “Viterbi”) alignment. Mary did not slap the green witch 12/6/2018

Word Alignment Induced Phrases Maria no dió una bofetada a la bruja verde Mary did not slap the green witch (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) 12/6/2018

Decoding Given the phrase alignments and translation probabilities how to decode? Basically stack decoding (ala A*; heuristic best first). Goal is to cover/account for all the foreign words with the best (highest prob) english sequence. 12/6/2018

Decoding 12/6/2018